Reinforcement learning works by letting the agent make decisions in a simulated environment, and punish or reward it according to its results. This is done repeatedly (tens of thousands of times). Eventually, the agent learns a reward function that maximizes rewards and minimizes punishments, thus becoming "intelligent".