Skip to Content
  • Reinforcement Learning Part 9: TD(λ) and Eligibility Traces

    Reinforcement Learning Part 9: TD(λ) and Eligibility Traces

    In the previous post, we saw how temporal difference (TD) learning updated value predictions in the middle of an episode rather than waiting to the very end, like we do with Monte Carlo (MC) methods. If you recall, the TD(0) algorithm updates value estimates after every step using a single reward in order to bootstrap…

    Read more


  • Reinforcement Learning Part 8: Temporal-Difference (TD) Learning

    Reinforcement Learning Part 8: Temporal-Difference (TD) Learning

    Temporal Difference (TD) learning is one of the foundational concepts in reinforcement learning (RL). It combines the notion of updating estimates before the final outcome is known, similar to how dynamic programming (DP) works, with the notion of learning directly from experience, like we saw with the Monte Carlo (MC) methods in part 7. MC…

    Read more


  • Reinforcement Learning Part 7: Monte Carlo Methods

    Reinforcement Learning Part 7: Monte Carlo Methods

    In the previous post, we saw how dynamic programming (DP) could be used to solve the Bellman equations, but they required knowledge of the environment’s transition probabilities. Unfortunately, we do not have that luxury in most real-world reinforcement learning (RL) problems. In DP, we have a model of the environment, which means that the transition…

    Read more


  • Reinforcement Learning Part 6: Dynamic Programming

    Reinforcement Learning Part 6: Dynamic Programming

    Over the past few posts, we built up to the Bellman optimality equations, which allows us to express the existence of an optimal policy that would maximize total rewards from a given environment, assuming the environment qualifies as a Markov Decision Process (MDP). Going forward, we will focus on algorithms that attempt to find policies…

    Read more


  • Reinforcement Learning Part 5: The Bellman Optimality Equations

    Reinforcement Learning Part 5: The Bellman Optimality Equations

    Up until now, we’ve been focused on evaluating the environment or policy: how much reward can we expect to receive from a particular state or by taking a particular action. We still have not touched the “learning” part of reinforcement learning (RL). In other words, we want to find what is the best possible policy…

    Read more