In this hyper digitized world, machine learning (ML) algorithms are making our lives and work easier. They are freeing us from redundant and menial tasks while allowing us to work faster. They enable addressing problems by providing accurate predictive information at our fingertips, as well as help in quickly visualizing the required roadmap – ensuring secure collaboration across entire teams of the workforce.
If you are already familiar with the hot buzzword ‘machine learning,’ then you must also have heard of some associated relevant terms such as Reinforcement Learning, Deep Learning, Supervised Learning, and Unsupervised Learning. This blog will try to briefly decipher the world of Reinforcement Learning (RL) in particular, while touching upon the others to contextualize their differences. As a deep dive primarily into RL, we will try and assess its applications in real-world domains, including improved data efficiency, stability, and multi-tasking. We will also understand how decision-makers can leverage the technology, and conduct research in a scalable way, to derive the desired business outcomes.
What makes RL unique in its application
In machine learning, we, as users, let the system learn by itself by identifying patterns in data sets and offering predictions. You can train a system to tell the difference between a pair of half and full trousers, for example, by feeding pictures of available images on the web. Deep learning is a subset of ML. Here the machine learns by itself akin to ML, except the algorithms are derived from human neural networks. It requires massive data sets as well as (you guessed it right) high computational power.
Finally, reinforcement learning is another subset of machine learning (similar to deep learning), except that it observes the environment and performs any task with a hit and miss/trial & error method to maximize the success of its outcomes.
Figure 1 – PC – https://mc.ai/introducing-deep-reinforcement-learning/
Also, unlike the rest of the two learning frameworks, the goal of RL is not to cluster or label data; instead, find the ideal sequence of actions to achieve the desired goal. RL works on overcoming the problem by letting something called an ‘agent’ discover, interact, and learn dynamically from its environment. Based on the observations and patterns of success, the ‘policy’ (a function) decides which actions to take. Think of it as a fancy version of the classic trial and error.
This table should give you a clear idea about the differences between general-purpose supervised and unsupervised Machine Learning and Reinforcement Learning:
SUPERVISED LEARNING | UNSUPERVISED LEARNING | REINFORCEMENT LEARNING | |
What is it | Works on existing sample data or examples | No pre-trained data or external teacher | Works by interacting with the environment |
Preference | Depreciable assets | Depreciable assets | Non-depreciable liabilities |
Tasks | Classify and regress | Cluster and associate | Exploit and explore |
Mapping input with output | Both input and output available for decision making to train the learner with given samples/data | Find the underlying patterns rather than mapping | A sequential decision-making system where the next input relies on the judgment of the learner/system |
Platform | Operates with interaction apps or software | Operates with interaction apps or software | Works better in AI with prevalent human interaction present |
Algorithms | Algorithms exist in using this learning | Algorithms exist in using this learning | No supervised or unsupervised algorithms used for this learning |
Integration | Is run on any platforms, or with any apps | Is run on any platforms, or with any apps | Runs on any software or hardware devices |
The anatomy of Reinforcement Learning
The concept behind reinforcement learning is that an agent will learn from the environment by interacting with it, trying different actions, and receive rewards for performing the correct activities. If you are a child, for example, you learn from your subjective experiences. Fire is hot, so being near it will be warm – a positive experience. If you touch it, however, it hurts! Here you just learned that fire is a positive thing when you are adequately away from it as it gives you warmth. Getting close will burn you.
Or imagine a robot trying to walk. Before it learns how to walk without falling across varying terrains, it will do it multiple times before getting it right. Unlike DL or ML, RL does not need previously collected training data, which means the machine learns by itself. It is for this reason that it is referred to as model-free reinforcement learning.
To get the best outcomes, therefore, doing is the only way to learn. That is how human beings learn – interactions, observances, and rewards. The goal is to maximize the cumulative rewards with each attempt. Moreover, RL learns by interacting with the environment. It is used to address problems where we have to find the best policy to achieve a goal. Reinforcement learning is simply a computational approach to learning from actions.
Real-world applications: A caveat
The industrial applications of RL are as vast as they are impactful – solving an incredibly wide gamut of problems including:
- Personalizing multi-channel marketing
- Customizing medicine dosages for patients
- Automating ads bidding and buying
- Automated calibration of robots and other machines
- Optimizing supply chain
- Dynamic resource allocation in HVAC systems, wind farms, etc.
That said, building scalable and robust RL models to solve real-world problems requires addressing several key challenges that are typically not present in ‘toy’ environments.
In part-2 of this blog, we shall dive deep into the practical and design implementation issues when dealing with Reinforcement Learning.