**Neural networks** (also referred to as connectionist systems) are a computational approach (often used in the field of artificial-intelligence), which is based on a large collection of neural units (AKA artificial neurons), loosely modelling the way a biological brain solves problems with large clusters of biological neurons connected by axons. Each neural unit is connected with many others, and links can be enforcing or inhibitory in their effect on the activation state of connected neural units. Each individual neural unit may have a summation function which combines the values of all its inputs together. There may be a threshold function or limiting function on each connection and on the unit itself: such that the signal must surpass the limit before propagating to other neurons. These systems are self-learning and trained, rather than explicitly programmed, and excel in areas where the solution or feature detection is difficult to express in a traditional computer program.

Neural networks typically consist of multiple layers or a cube design, and the signal path traverses from front to back. Back propagation is where the forward stimulation is used to reset weights on the "front" neural units and this is sometimes done in combination with training where the correct result is known. More modern networks are a bit more free flowing in terms of stimulation and inhibition with connections interacting in a much more chaotic and complex fashion. Dynamic neural networks are the most advanced- in that they dynamically can, based on rules, form new connections and even new neural units while disabling others.

The goal of the neural network is to solve problems in the same way that the human brain would, although several neural networks are more abstract. Modern neural network projects typically work with a few thousand to a few million neural units and millions of connections, which is still several orders of magnitude less complex than the human brain and closer to the computing power of a worm.

New brain research often stimulates new patterns in neural networks. One new approach is using connections which span much further and link processing layers rather than always being localized to adjacent neurons. Other research being explored with the different types of signals over time that axons propagate, such as Deep Learning, interpolates greater complexity than a set of boolean variables being simply on or off.

Neural networks are based on real numbers, with the value of the core and of the axon typically being a representation between 0.0 and 1.

An interesting facet of these systems is that they are unpredictable in their success with self-learning. After training, some become great problem solvers and others don't perform as well. In order to train them, several thousand cycles of interaction typically occur.

Like other machine learning methods – systems that learn from data – neural networks have been used to solve a wide variety of tasks, like computer vision and speech recognition, that are hard to solve using ordinary rule-based programming.

Historically, the use of neural network models marked a directional shift in the late eighties from high-level (symbolic) artificial intelligence, characterized by expert systems with knowledge embodied in if-then rules, to low-level (sub-symbolic) machine learning, characterized by knowledge embodied in the parameters of a cognitive model with some dynamical system.

It is very clear that the recent years have brought on strong renewed interest in neural networks, especially with the dramatic advances in machine learning this is noticeable. Also, it is notable that the interest levels are internationally high across the globe (India, China and oddly Iran stand out).

Wikipedia]]>In computer science, **computational learning theory** (or just learning theory) is a subfield of Artificial Intelligence devoted to studying the design and analysis of machine learning algorithms.

Computational Learning Theory quiz as discussed by two Georgia Tech machine learning researchers.

Theoretical results in machine learning mainly deal with a type of inductive learning called supervised learning. In supervised learning, an algorithm is given samples that are labeled in some useful way. For example, the samples might be descriptions of mushrooms, and the labels could be whether or not the mushrooms are edible. The algorithm takes these previously labeled samples and uses them to induce a classifier. This classifier is a function that assigns labels to samples including the samples that have never been previously seen by the algorithm. The goal of the supervised learning algorithm is to optimize some measure of performance such as minimizing the number of mistakes made on new samples.

In addition to performance bounds, computational learning theory studies the time complexity and feasibility of learning. In computational learning theory, a computation is considered feasible if it can be done in polynomial time. There are two kinds of time complexity results:

- Positive results – Showing that a certain class of functions is learnable in polynomial time.
- Negative results – Showing that certain classes cannot be learned in polynomial time.

Negative results often rely on commonly believed, but yet unproven assumptions, such as:

- Computational complexity – P ≠ NP (the P versus NP problem);
- Cryptographic – One-way functions exist.

There are several different approaches to computational learning theory. These differences are based on making assumptions about the inference principles used to generalize from limited data. This includes different definitions of probability (see frequency probability, Bayesian probability) and different assumptions on the generation of samples.

Here is a graph of how much interest there has been in computational learning theory over time and where that interest is geographically focused. Most notable is that this somehow is only of real interest in India.

If you want to dive deeper into computational learning theory, you might want to consider these books.

]]>Among the most used adaptive algorithms is the Widrow-Hoff’s least mean squares (LMS), which represents a class of stochastic gradient-descent algorithms used in adaptive filtering and machine learning. In adaptive filtering, the LMS is used to mimic a desired filter by finding the filter coefficients that relate to producing the least mean square of the error signal ( the difference between the desired and the actual signal).

For example, stable partition, using no additional memory is O(n lg n) but given O(n) memory, it can be O(n) in time. As implemented by the C++ Standard Library, stable_partition is adaptive and so it acquires as much memory as it can get (up to what it would need at most) and applies the algorithm using that available memory. Another example is adaptive sort, whose behaviour changes upon the presortedness of its input.

An example of an adaptive algorithm in radar systems is the constant false alarm rate (CFAR) detector.

In machine learning and optimization, many algorithms are adaptive or have adaptive variants, which usually means that the algorithm parameters are automatically adjusted according to statistics about the optimisation thus far (e.g. the rate of convergence). Examples include adaptive simulated annealing, adaptive coordinate descent, AdaBoost, and adaptive quadrature.

In data compression, adaptive coding algorithms such as Adaptive Huffman coding or Prediction by partial matching can take a stream of data as input, and adapt their compression technique based on the symbols that they have already encountered.

In signal processing, the Adaptive Transform Acoustic Coding (ATRAC) codec used in MiniDisc recorders is called "adaptive" because the window length (the size of an audio "chunk") can change according to the nature of the sound being compressed, to try and achieve the best-sounding compression strategy.

]]>One problem for understanding action selection is determining the level of abstraction used for specifying an "act". At the most basic level of abstraction, an atomic act could be anything from contracting a muscle cell to provoking a war. Typically for any one action-selection mechanism, the set of possible actions is predefined and fixed.

Most researchers working in this field place high demands on their agents:

- The acting agent typically must select its action in dynamic and unpredictable environments.
- The agents typically act in real time; therefore they must make decisions in a timely fashion.
- The agents are normally created to perform several different tasks. These tasks may conflict for resource allocation (e.g. can the agent put out a fire and deliver a cup of coffee at the same time?)
- The environment the agents operate in may include humans, who may make things more difficult for the agent (either intentionally or by attempting to assist.)
- The agents themselves are often intended to model animals or humans, and animal/human behaviour is quite complicated.

For these reasons action selection is not trivial and attracts a good deal of research.

]]>Learning action models is important when goals change. When an agent acted for a while, it can use its accumulated knowledge about actions in the domain to make better decisions. Thus, learning action models differs from reinforcement learning. It enables reasoning about actions instead of expensive trials in the world. Action model learning is a form of inductive reasoning, where new knowledge is generated based on agent's observations. It differs from standard supervised learning in that correct input/output pairs are never presented, nor imprecise action models explicitly corrected.

Usual motivation for action model learning is the fact that manual specification of action models for planners is often a difficult, time consuming, and error-prone task (especially in complex environments).

]]>Action languages fall into two classes: action description languages and action query languages. Examples of the former include STRIPS, PDDL, Language A (a generalization of STRIPS; the propositional part of Pednault's ADL), Language B (an extension of A adding indirect effects, distinguishing static and dynamic laws) and Language C (which adds indirect effects also, and does not assume that every fluent is automatically "inertial"). There are also the Action Query Languages P, Q and R. Several different algorithms exist for converting action languages, and in particular, action language C, to answer set programs. Since modern answer-set solvers make use of boolean SAT algorithms to very rapidly ascertain satisfiability, this implies that action languages can also enjoy the progress being made in the domain of boolean SAT solving.

]]>Abstraction can apply to control or to data: Control abstraction is the abstraction of actions while data abstraction is that of data structures.

- Control abstraction involves the use of subroutines and control flow abstractions
- Data abstraction allows handling pieces of data in meaningful ways. For example, it is the basic motivation behind the datatype.

The notion of an object in object-oriented programming can be viewed as a way to combine abstractions of data and code.

The same abstract definition can be used as a common interface for a family of objects with different implementations and behaviors but which share the same meaning. The inheritance mechanism in object-oriented programming can be used to define an abstract class as the common interface.

The recommendation that programmers use abstractions whenever suitable in order to avoid duplication (usually of code) is known as the abstraction principle. The requirement that a programming language provide suitable abstractions is also called the abstraction principle.

]]>Formally, an ADT may be defined as a "class of objects whose logical behaviour is defined by a set of values and a set of operations"; this is analogous to an algebraic structure in mathematics. What is meant by "behaviour" varies by author, with the two main types of formal specifications for behaviour being axiomatic (algebraic) specification and an abstract model; these correspond to axiomatic semantics and operational semantics of an abstract machine, respectively. Some authors also include the computational complexity ("cost"), both in terms of time (for computing operations) and space (for representing values). In practice many common data types are not ADTs, as the abstraction is not perfect, and users must be aware of issues like arithmetic overflow that are due to the representation. For example, integers are often stored as fixed width values (32-bit or 64-bit binary numbers), and thus experience integer overflow if the maximum value is exceeded.

ADTs are a theoretical concept in computer science, used in the design and analysis of algorithms, data structures, and software systems, and do not correspond to specific features of computer languages—mainstream computer languages do not directly support formally specified ADTs. However, various language features correspond to certain aspects of ADTs, and are easily confused with ADTs proper; these include abstract types, opaque data types, protocols, and design by contract. ADTs were first proposed by Barbara Liskov and Stephen N. Zilles in 1974, as part of the development of the CLU language.

]]>In the 1990s, as computing power grew, the fields of law, computer science, and artificial intelligence research spurred renewed interest in the subject of abduction. Diagnostic expert systems frequently employ abduction.

]]>A capsule is a nested set of neural layers. So in a regular neural network, you keep on adding more layers. In a capsule network you would add more layers inside a single layer. Or in other words, nest a neural layer inside another. The state of the neurons inside a capsule capture the above properties of one entity inside an image. A capsule outputs a vector to represent the existence of the entity. The orientation of the vector represents the properties of the entity. The vector is sent to all possible parents in the neural network. For each possible parent, a capsule can find a prediction vector. Prediction vector is calculated based on multiplying its own weight and a weight matrix.

Whichever parent has the largest scalar prediction vector product, increases the capsule bond. Rest of the parents decrease their bond. This routing by agreement method is superior to the current mechanism like max-pooling. Max pooling routes based on the strongest feature detected in the lower layer. Apart from dynamic routing, CapsNet talks about adding squashing to a capsule. Squashing is a non-linearity. So instead of adding squashing to each layer like how you do in CNN, you add the squashing to a nested set of layers. So the squashing function gets applied to the vector output of each capsule.

Learn MoreHere are some news articles related to capsule networks.

]]>**Linear algebra** is the branch of mathematics concerning vector spaces and linear mappings between such spaces. It includes the study of lines, planes, and subspaces, but is also concerned with properties common to all vector spaces.

The set of points with coordinates that satisfy a linear equation forms a hyperplane in an n-dimensional space. The conditions under which a set of n hyperplanes intersect in a single point is an important focus of study in linear algebra. Such an investigation is initially motivated by a system of linear equations containing several unknowns. Such equations are naturally represented using the formalism of matrices and vectors.

Linear algebra is central to both pure and applied mathematics. For instance, abstract algebra arises by relaxing the axioms of a vector space, leading to a number of generalizations. Functional analysis studies the infinite-dimensional version of the theory of vector spaces. Combined with calculus, linear algebra facilitates the solution of linear systems of differential equations.

Techniques from linear algebra are also used in analytic geometry, engineering, physics, natural sciences, computer science, computer animation, advanced facial recognition algorithms and the social sciences (particularly in economics). Because linear algebra is such a well-developed theory, nonlinear mathematical models are sometimes approximated by linear models.

WikipediaIf you want to "learn" linear algebra Khan Academy and MIT OpenCourseWare (look for the video lectures) have great courses to offer for free.

]]>**Data science**, also known as data-driven science, is an interdisciplinary field about scientific processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, which is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and predictive analytics, similar to Knowledge Discovery in Databases (KDD).

Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.

WikipediaDelve deeper into the topic of data science with these books.

]]>**Unsupervised machine learning** is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution – this distinguishes unsupervised learning from supervised learning and reinforcement learning.

Unsupervised Learning simply explained by Georgia Tech scientists on Machine Learning.

**Unsupervised learning** is closely related to the problem of density estimation in statistics. However, unsupervised learning also encompasses many other techniques that seek to summarize and explain key features of the data.

Unsupervised learning is one of the three main areas of machine learning:

WikipediaHere are a few books from Amazon on the topic of unsupervised learning, naturally.

]]>**Supervised learning** is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias).

Supervised Learning explained by Georgia Tech researchers on Machine Learning.

The parallel task in human and animal psychology is often referred to as concept learning.

Supervised learning is one of the three main areas in machine learning:

WikipediaHere are a couple of books related to supervised machine learning on Amazon.

]]>**Reinforcement learning** is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The problem, due to its generality, is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. In the operations research and control literature, the field where reinforcement learning methods are studied is called approximate dynamic programming. The problem has been studied in the theory of optimal control, though most studies are concerned with the existence of optimal solutions and their characterization, and not with the learning or approximation aspects. In economics and game theory, **reinforcement learning** may be used to explain how equilibrium may arise under bounded rationality.

Reinforcement Learning explained by two Georgia Tech experts on Machine Learning.

In machine learning, the environment is typically formulated as a Markov decision process (MDP) as many reinforcement learning algorithms for this context utilize dynamic programming techniques. The main difference between the classical techniques and reinforcement learning algorithms is that the latter do not need knowledge about the MDP and they target large MDPs where exact methods become infeasible.

Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected. Further, there is a focus on on-line performance, which involves finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge). The exploration vs. exploitation trade-off in reinforcement learning has been most thoroughly studied through the multi-armed bandit problem and in finite MDPs.

Reinforcement learning is one of the three main areas in machine learning:

WikipediaHere are a couple of book suggestions to either get you started with reinforcement learning, or even help you improve your current skills and knowledge in the field.

]]>In machine learning, a **convolutional neural network** (CNN, or ConvNet) is a class of deep, feed-forward artificial neural network that has successfully been applied to analyzing visual imagery.

CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics.

Convolutional networks were inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.

A gentle guided tour of Convolutional Neural Networks. Come lift the curtain and see how the magic is done. For slides and text, check out the accompanying blog post: http://brohrer.github.io/how_convolutional_neural_networks_work.html

CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage.

They have applications in image and video recognition, recommender systems and natural language processing.

]]>**Computer vision** is an interdisciplinary field that deals with how computers can be made for gaining high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions. Understanding in this context means the transformation of visual images (the input of the retina) into descriptions of the world that can interface with other thought processes and elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner. As a technological discipline, computer vision seeks to apply its theories and models for the construction of computer vision systems.

Sub-domains of computer vision include scene reconstruction, event detection, video tracking, object recognition, 3D pose estimation, learning, indexing, motion estimation, and image restoration.

]]>