Blog - Shawn Hymel

Reinforcement Learning Part 8: Temporal-Difference (TD) Learning

June 23, 2026

Temporal Difference (TD) learning is one of the foundational concepts in reinforcement learning (RL). It combines the notion of updating estimates before the final outcome is known, similar to how dynamic programming (DP) works, with the notion of learning directly from experience, like we saw with the Monte Carlo (MC) methods in part 7. MC…
Read more
Reinforcement Learning Part 7: Monte Carlo Methods

June 16, 2026

In the previous post, we saw how dynamic programming (DP) could be used to solve the Bellman equations, but they required knowledge of the environment’s transition probabilities. Unfortunately, we do not have that luxury in most real-world reinforcement learning (RL) problems. In DP, we have a model of the environment, which means that the transition…
Read more
Reinforcement Learning Part 6: Dynamic Programming

June 9, 2026

Over the past few posts, we built up to the Bellman optimality equations, which allows us to express the existence of an optimal policy that would maximize total rewards from a given environment, assuming the environment qualifies as a Markov Decision Process (MDP). Going forward, we will focus on algorithms that attempt to find policies…
Read more
Reinforcement Learning Part 5: The Bellman Optimality Equations

June 4, 2026

Up until now, we’ve been focused on evaluating the environment or policy: how much reward can we expect to receive from a particular state or by taking a particular action. We still have not touched the “learning” part of reinforcement learning (RL). In other words, we want to find what is the best possible policy…
Read more
Reinforcement Learning Part 4: Expected Return, Value Functions, and Bellman Equations

May 26, 2026

In the previous post, we defined a policy, provided the foundational concept of a Markov Decision Process (MDP), and talked about trajectories. We’re going to combine these concepts with the idea of future discounted returns to create value functions. We now introduce two new concepts: state-value function (given by V(s)) that attempts to estimate the…
Read more

Reinforcement Learning Part 8: Temporal-Difference (TD) Learning

Reinforcement Learning Part 7: Monte Carlo Methods

Reinforcement Learning Part 6: Dynamic Programming

Reinforcement Learning Part 5: The Bellman Optimality Equations

Reinforcement Learning Part 4: Expected Return, Value Functions, and Bellman Equations