Learn in Public | Introduction to Deep Reinforcement Learning Course by Huggingface

Overview

These are my notes following the Deep Reinforcement Learning Course by Huggingface

Syllabus
Pasted image 20250724141943.png
Pasted image 20250724141950.png

Unit 1: Introduction to Deep Reinforcement Learning:

Goal:
train a Deep Reinforcement Learning agent, a lunar lander to land correctly on the Moon using Stable-Baselines3 , a Deep Reinforcement Learning library.

The big picture:
The idea behind Reinforcement Learning is that an agent (an AI) will learn from the environment by interacting with it (through trial and error) and receiving rewards (negative or positive) as feedback for performing actions.

A formal definition:
Reinforcement learning is a framework for solving control tasks (also called decision problems) by building agents that learn from the environment by interacting with it through trial and error and receiving rewards (positive or negative) as unique feedback.

The reinforcement learning framework:
Pasted image 20250724143336.png

This RL loop outputs a sequence of state, action, reward and next state.

Pasted image 20250724144157.png

The agent’s goal is to maximize its cumulative reward, called the expected return.

The reward hypothesis: the central idea of Reinforcement Learning

Markov Property

Observations/States Space

Action Space
The Action space is the set of all possible actions in an environment.
The actions can come from a discrete or continuous space:

Rewards and the discounting:

Our discounted expected cumulative reward is:
Pasted image 20250724145754.png

The type of tasks

  1. Episodic task
    In this case, we have a starting point and an ending point (a terminal state). This creates an episode: a list of States, Actions, Rewards, and new States.

  2. Continuing tasks
    These are tasks that continue forever (no terminal state). In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.

The exploration/exploitation trade-off

The two main approaches for solving RL problems


Bonus Unit 1: Introduction to Deep Reinforcement Learning with Huggy