Video description
A neat introduction to dive into Deep Reinforcement Learning.
Sandeep Chigurupati
Reinforcement Learning in Motion introduces you to the exciting world of machine systems that learn from their environments! Developer, data scientist, and expert instructor Phil Tabor guides you from the basics all the way to programming your own constantly-learning AI agents. In this course, he’ll break down key concepts like how RL systems learn, how to sense and process environmental data, and how to build and train AI agents. As you learn, you’ll master the core algorithms and get to grips with tools like Open AI Gym, numpy, and Matplotlib.
Reinforcement systems learn by doing, and so will you in this hands-on course! You’ll build and train a variety of algorithms as you go, each with a specific purpose in mind. The rich and interesting examples include simulations that train a robot to escape a maze, help a mountain car get up a steep hill, and balance a pole on a sliding cart. You’ll even teach your agents how to navigate Windy Gridworld, a standard exercise for finding the optimal path even with special conditions!
With reinforcement learning, an AI agent learns from its environment, constantly responding to the feedback it gets. The agent optimizes its behavior to avoid negative consequences and enhance positive outcomes. The resulting algorithms are always looking for the most positive and efficient outcomes!
Importantly, with reinforcement learning you don’t need a mountain of data to get started. You just let your AI agent poke and prod its environment, which makes it much easier to take on novel research projects without well-defined training datasets.
Inside:
- What is a reinforcement learning agent?
- An introduction to the Open AI Gym
- Identifying appropriate algorithms
- Implementing RL algorithms using Numpy
- Visualizing performance with Matplotlib
You’ll need to be familiar with Python and machine learning basics. Examples use Python libraries like NumPy and Matplotlib. You'll also need some understanding of linear algebra and calculus, please see the equations in the Free Downloads section for examples.
Phil Tabor is a lifelong coder with a passion for simplifying and teaching complex topics. A physics PhD and former Intel process engineer, he works as a data scientist, teaches machine learning on YouTube, and contributes to Sensenet, an open source project using deep reinforcement learning to teach robots to identify objects by touch.
After watching the first few sections you'll be able to experiment with some simple algorithms and definitely want to continue learning more.
Rob Pacheco
Gives a fantastic look into the examples and mathematical background.
Harald Kuhn
It prepares you to apply reinforcement learning directly to a problem you have in hand!
Yaser Marey
Table of Contents
INTRODUCTION TO REINFORCEMENT LEARNING
Course introduction
00:05:01
Getting Acquainted with Machine Learning
00:09:26
How Reinforcement Learning Fits In
00:05:26
Required software
00:03:10
KEY CONCEPTS
Understanding the agent
00:05:04
Defining the environment
00:05:52
Designing the reward
00:04:24
How the agent learns
00:09:59
Choosing actions
00:07:15
Coding the environment
00:06:23
Finishing the maze-running robot problem
00:05:00
BEATING THE CASINO: THE EXPLORE-EXPLOIT DILEMMA
Introducing the multi-armed bandit problem
00:03:47
Action-value methods
00:06:43
Coding the multi-armed bandit test bed
00:06:55
Moving the goal posts: nonstationary problems
00:07:08
Optimistic initial values and upper confidence bound action selection
00:11:51
Wrapping up the explore-exploit dilemma
00:04:51
SKATING THE FROZEN LAKE: MARKOV DECISION PROCESSES
Introducing Markov decision processes and the frozen lake environment
00:09:21
Even robots have goals
00:06:45
Handling uncertainty with policies and value functions
00:08:37
Achieving mastery: Optimal policies and value functions
00:07:30
Skating off the frozen lake
00:05:29
NAVIGATING GRIDWORLD WITH DYNAMIC PROGRAMMING
Crash-landing on planet Gridworld
00:09:42
Let’s make a plan: Policy evaluation in Gridworld
00:08:18
The best laid plans: Policy improvement in the Gridworld
00:03:57
Hastening our escape with policy iteration
00:04:57
Creating a backup plan with value iteration
00:06:09
Wrapping up dynamic programming
00:04:08
NAVIGATING THE WINDY GRIDWORLD WITH MONTE CARLO METHODS
The windy gridworld problem
00:05:33
Monte who?
00:07:12
No substitute for action: Policy evaluation with Monte Carlo methods
00:03:53
Monte Carlo control and exploring starts
00:07:43
Monte Carlo control without exploring starts
00:06:15
Off-policy Monte Carlo methods
00:12:06
Return to the frozen lake and wrapping up Monte Carlo methods
00:06:17
BALANCING THE CART POLE: TEMPORAL DIFFERENCE LEARNING
The cart pole problem
00:04:57
TD(0) prediction
00:09:19
On-policy TD control: SARSA
00:07:34
Off-policy TD control: Q learning
00:05:13
Back to school with double learning
00:09:06
Wrapping up temporal difference learning
00:05:43
CLIMBING THE MOUNTAIN WITH APPROXIMATION METHODS
The continuous mountain car problem
00:04:31
Why approximation methods?
00:05:47
Stochastic gradient descent: The intuition
00:04:05
Stochastic gradient descent: The mathematics
00:05:18
Approximate Monte Carlo predictions
00:08:43
Linear methods and tiling
00:10:54
TD(0) semi-gradient prediction
00:07:36
Episodic semi-gradient control: SARSA
00:08:52
Over the hill: wrapping up approximation methods and the mountain car problem
00:06:10
SUMMARY
Course recap
00:10:11
The frontiers of reinforcement learning
00:06:31
What to do next
00:04:05