Skip to Content
  • Reinforcement Learning Part 6: Dynamic Programming

    Reinforcement Learning Part 6: Dynamic Programming

    Over the past few posts, we built up to the Bellman optimality equations, which allows us to express the existence of an optimal policy that would maximize total rewards from a given environment, assuming the environment qualifies as a Markov Decision Process (MDP). Going forward, we will focus on algorithms that attempt to find policies…

    Read more


  • Reinforcement Learning Part 5: The Bellman Optimality Equations

    Reinforcement Learning Part 5: The Bellman Optimality Equations

    Up until now, we’ve been focused on evaluating the environment or policy: how much reward can we expect to receive from a particular state or by taking a particular action. We still have not touched the “learning” part of reinforcement learning (RL). In other words, we want to find what is the best possible policy…

    Read more


  • Reinforcement Learning Part 4: Expected Return, Value Functions, and Bellman Equations

    Reinforcement Learning Part 4: Expected Return, Value Functions, and Bellman Equations

    In the previous post, we defined a policy, provided the foundational concept of a Markov Decision Process (MDP), and talked about trajectories. We’re going to combine these concepts with the idea of future discounted returns to create value functions. We now introduce two new concepts: state-value function (given by V(s)) that attempts to estimate the…

    Read more


  • Reinforcement Learning Part 3: Policies, Markov Decision Processes (MDPs), and Trajectories

    Reinforcement Learning Part 3: Policies, Markov Decision Processes (MDPs), and Trajectories

    In the third part of this reinforcement learning (RL) series, we’re going to give a formal definition for a policy and then conceptualize how actions and states play out in a trajectory. While we discussed rewards and returns in the previous post, we’re going to see how Markov Decision Processes (MDPs) provide the underlying foundation…

    Read more


  • Reinforcement Learning Part 2: Rewards, Returns, and the Discount Factor

    Reinforcement Learning Part 2: Rewards, Returns, and the Discount Factor

    In this second post on reinforcement learning (RL), we build on the introduction from part 1 by revisiting the idea of a reward and building up to the idea of discounted returns. Recall that the goal of RL is to maximize the rewards earned by the agent over time. We’re going to discuss three main…

    Read more