A toolbox of principled learning paradigms

back to our blogs

A toolbox of principled learning paradigms

Haitham Bou Ammar

By Haitham Bou-Ammar

“Originality often consists of linking up ideas whose connection was not previously suspected.” W.I. Beveridge – The Art of Scientific Investigation.

Combining ideas in original ways is the essence of creativity. Our team at PROWLER.io combines diverse learning paradigms in new ways to teach autonomous computer agents to make safe, effective decisions. We partition decision-making into its base constituents and tailor learning methodologies to different parts of the process. Unlike DeepMind, who mostly focus on deep learning and have become very proficient at big data recognition tasks, we believe that using a more diverse toolbox enables us to deliver interpretable, efficient, principled AI for a broad range of applications.

This view has been validated by our Chairman, Prof. Carl Edward Rasmussen. He has shown that following a principled technique for decision-making can solve a real-world benchmark (i.e., an inverted pendulum task) in about 17.5 seconds, a significant reduction when compared to the 600 seconds needed to train deep neural networks on a simpler simulated version of the same problem. This is a 34x advantage.

The scientific community categorises traditional machine learning in three subfields that differ depending on available data and problems considered.

  1. Supervised learning: the learner seeks a relationship between input and output events encoded in a given labelled data set (e.g., predicting weather patterns from historical data).
  2. Unsupervised learning: the learner is only provided with input events, and the goal is to discover “interesting” patterns (e.g., clusters or features).
  3. Reinforcement learning: an agent interacts with an unknown environment to learn an optimal behavioural rule that maximises a reward signal.

One can further categorise each of the above fields based on how the model is encoded. Under supervised learning, we can differentiate parametric and non-parametric encoding methods. In parametric techniques, the overall shape of the relation between input and output events is assumed a priori. A widely known special case within this category is deep neural networks. Here, the programmer makes design assumptions (e.g., number of layers, number of units per layer, etc.) and then an algorithm uses self-tuning to determine the input-output relation. Though widely adopted, parametric function approximators suffer a loss of expressiveness when considering arbitrary classes of relations. This is because the approximation capabilities of parametric techniques are necessarily bounded to the design choices initially imposed by the programmer.

But machine learning seeks smart and generalisable algorithms. We should be able to do better than parametric techniques. Building on fundamentals of data-science rooted in statistics, we believe that accurate probabilistic modelling is key. To remedy the problems of parametric methods, we focus our attention on scaling non-parametric settings to large,  high-dimensional data.

Some of our methods can be seen as generalisations of those followed by others. We consider infinite dimensional extensions to deep networks through the use of Gaussian processes and efficient probabilistic models. This allows us to have richer classes of models. Apart from the gain in approximation power, using statistics to analyse data enables artificial intelligence with uncertainty estimates; a tool that researchers rely on in most other fields of science but that remains unsupported by current deep-learning technology.

Learning Paradigms 4

One notable success of reinforcement learning is the development of deep Q-networks (DQNs) that achieve human-level performance on various Atari games from visual inputs. Since their development, DQNs have emerged as a powerful technique for decision-making in high-dimensional input representations (e.g., images). To achieve this, DeepMind again applies ideas from parametric supervised learning to the task of reinforcement learning. Although successful in some applications, DQNs are tabula-rasa learners that acquire experience through random interaction with the environment. For this reason, DQNs fail when it comes to complex problems with “bottleneck” states, such as navigating a maze with multiple doors or search-and-rescue scenarios. Here, the probability of randomly choosing the correct action for crossing a door or finding a casualty is extremely low. As such, successful behaviour that reinforces learning is extremely scarce and DQNs struggle.

So what’s the problem? It’s clear from the fundamentals of human psychology that the problem originates from the tabula rasa – the blank slate assumption inherent in reinforcement learning. We humans almost never learn from scratch; we transfer knowledge from other minds and the environment. As Leberman et al. state “transfer is a core concept in learning...it helps us learn by facilitating storage, processing, remembering...it is therefore the very essence of understanding, interacting, and creating. Every time learning occurs previous learning is used as a building block.” Although knowledge reuse between different problems is a fundamental feature of human learning, current machine learning technology has yet to support robust and efficient transfer between problems. Rather than just relying on increasing computational power, at PROWLER.io we fuse knowledge reuse algorithms with reinforcement learning by employing combinations of the following technologies:

  1. Transfer Learning: In TL, the agent attempts to transfer knowledge from one source problem to a corresponding target.
  2. Multitask Learning: In MTL, the agent attempts to learn multiple problems simultaneously for better generalisation to novel tasks.
  3. Lifelong Learning: In LFL, the agent learns to learn by acting against an unknown adversary with tasks streamed sequentially.
  4. Information-Theoretic Bounded Rationality: In Info-RL, the agent improves its learning by bounding its behaviour with respect to prior knowledge.
  5. Inverse Reinforcement Learning: In IRL, the agent tries to recover human behaviour in search of better generalisation.

Going back to fundamentals is what differentiates us at PROWLER.io. It is the synergy of a multitude of subfields that enables us to design the world's first truly autonomous artificial intelligence agents.

Help us build AI that will change the world

join our team