Reinforcement Learning
Make the machine learn by trial and experimentation
Reinforcement learning is used whenever there is an agent that acts in a dynamic environment. Some examples:
- Chess AI (or any videogame AI)
- Robotics
Reinforcement learning works by letting the agent make decisions in a simulated environment, and punish or reward it according to its results. This is done repeatedly (tens of thousands of times). Eventually, the agent learns a reward function that maximizes rewards and minimizes punishments, thus becoming "intelligent".
This type of learning can be very effective, especially against traditional forms of artificial intelligence. In the world of chess, AlphaZero (RL based) dominated Stockfish (traditional AI) shortly after its creation.
However, since this type of learning is mostly unsupervised, it sometimes can lead to surprising outcomes. For example, in OpenAI's Hide-and-Seek simulation, seekers ultimately learned to exploit the simulation physics engine and effectively fly to find the hiders.

OpenAI Hide and Seek simulation
Last modified 3yr ago