Skip to Content
  • Reinforcement Learning Part 8: Temporal-Difference (TD) Learning

    Reinforcement Learning Part 8: Temporal-Difference (TD) Learning

    Temporal Difference (TD) learning is one of the foundational concepts in reinforcement learning (RL). It combines the notion of updating estimates before the final outcome is known, similar to how dynamic programming (DP) works, with the notion of learning directly from experience, like we saw with the Monte Carlo (MC) methods in part 7. MC…

    Read more


  • Reinforcement Learning Part 7: Monte Carlo Methods

    Reinforcement Learning Part 7: Monte Carlo Methods

    In the previous post, we saw how dynamic programming (DP) could be used to solve the Bellman equations, but they required knowledge of the environment’s transition probabilities. Unfortunately, we do not have that luxury in most real-world reinforcement learning (RL) problems. In DP, we have a model of the environment, which means that the transition…

    Read more


  • Reinforcement Learning Part 6: Dynamic Programming

    Reinforcement Learning Part 6: Dynamic Programming

    Over the past few posts, we built up to the Bellman optimality equations, which allows us to express the existence of an optimal policy that would maximize total rewards from a given environment, assuming the environment qualifies as a Markov Decision Process (MDP). Going forward, we will focus on algorithms that attempt to find policies…

    Read more


  • Reinforcement Learning Part 5: The Bellman Optimality Equations

    Reinforcement Learning Part 5: The Bellman Optimality Equations

    Up until now, we’ve been focused on evaluating the environment or policy: how much reward can we expect to receive from a particular state or by taking a particular action. We still have not touched the “learning” part of reinforcement learning (RL). In other words, we want to find what is the best possible policy…

    Read more


  • Reinforcement Learning Part 4: Expected Return, Value Functions, and Bellman Equations

    Reinforcement Learning Part 4: Expected Return, Value Functions, and Bellman Equations

    In the previous post, we defined a policy, provided the foundational concept of a Markov Decision Process (MDP), and talked about trajectories. We’re going to combine these concepts with the idea of future discounted returns to create value functions. We now introduce two new concepts: state-value function (given by V(s)) that attempts to estimate the…

    Read more