History and Future of Reinforcement Learning in Artificial Intelligence: A Real-World approach

Machine Learning is an important part of artificial intelligence in which computers learn in an automated fashion from the provided information. Machine Learning applications are increasingly present in our lives and due to the advantages that entail predicting events with a certain degree of reliability, they have been covering more and more sectors. Supervised learning is the most developed branch of Machine Learning. Some daily life examples could be predicting the weather forecast, displaying the most suitable adverts to users on the websites, selecting the best route based on traffic, estimating the future price of the stock market, or calculating the risks of a catastrophe for an insurance company.

One of the most developed areas in recent years has been Reinforcement Learning. In RL, there is an agent that performs some actions in an environment and gets a reward for those actions. The agent gets a positive reward when the action is correct and a negative reward when the action is incorrect. In the long term, the agent will learn which sequences of actions drove it to the highest rewards, to repeat them, and which drove it to penalizations to avoid them. That’s a very similar approach to that of humans learning to ride a bicycle or mastering a sport. RL applications are ideal in scenarios where there is a clear goal to optimize such as energy consumption or traffic reduction, and there are millions of different actions that need to be sequentially taken to converge to the optimal solution.

RL dates back to the 80s but their applications were quite limited due to the available computer resources and the exponential growth of the Q-Table for complex applications. However, Deep Learning is able to approximate enormous Q-Tables with artificial neural networks catapulting RL to a new stage. The difference between RL and ML can be best appreciated with an example: Imagine that we want to create an algorithm able to play an online game. If we use supervised learning, we will record the movements of a professional and teach an algorithm to imitate such behaviour. The agent will be just as good as the professional with whom he is trained but in no case better than him.

RL has shown that an agent is able to play even better than a human simply by being trained with rewards playing against itself, as it happened in the GO game, the AlphaGo beat the world champion master Lee Sedol by 4 games to one. This was a great achievement in artificial intelligence since the number of combinations in GO, is greater than the number of atoms in the universe. One of the most successful RL applications has been that of Google in which the electricity bill of the data centres was reduced by 40%. Traditionally, games with imperfect information such as Poker, where uncertainty has to be effectively addressed, have remained as a challenge to Artificial intelligence for a long time. However, Deepstack an RL algorithm was able to defeat poker professional players at the famous game Texas hold’em.

Finally, although RL seems to be quite revolutionary, there are a few drawbacks that need to be solved which basically are the amount of time to find the optimal solution and the complexity of finding out a proper configuration of all the parameters involved in the program.