Reinforcement Learning

Make the machine learn by trial and experimentation

Reinforcement learning is used whenever there is an agent that acts in a dynamic environment. Some examples:

  • Chess AI (or any videogame AI)

  • Self-driving cars (after processing video with CV)

  • Robotics

Reinforcement learning works by letting the agent make decisions in a simulated environment, and punish or reward it according to its results. This is done repeatedly (tens of thousands of times). Eventually, the agent learns a reward function that maximizes rewards and minimizes punishments, thus becoming "intelligent".

This type of learning can be very effective, especially against traditional forms of artificial intelligence. In the world of chess, AlphaZero (RL based) dominated Stockfish (traditional AI) shortly after its creation.

However, since this type of learning is mostly unsupervised, it sometimes can lead to surprising outcomes. For example, in OpenAI's Hide-and-Seek simulation, seekers ultimately learned to exploit the simulation physics engine and effectively fly to find the hiders.

OpenAI Hide and Seek simulation
Edit on GitHub