Rating 1.0 out of 5 (2 ratings in Udemy)
What you'll learn- Being able to start Deep reinforcement-learning research
- Being able to start Deep reinforcement-learning engineering role
- Understand modern state-of-the-art Deep reinforcement-learning knowledge
- Understand Deep reinforcement-learning knowledge
DescriptionHello I am Nitsan Soffair, A Deep RL researcher at BGU.
In my Deep reinforcement-learning course you will learn the newest state-of-the-art Deep reinforcement-learning …
Rating 1.0 out of 5 (2 ratings in Udemy)
What you'll learn- Being able to start Deep reinforcement-learning research
- Being able to start Deep reinforcement-learning engineering role
- Understand modern state-of-the-art Deep reinforcement-learning knowledge
- Understand Deep reinforcement-learning knowledge
DescriptionHello I am Nitsan Soffair, A Deep RL researcher at BGU.
In my Deep reinforcement-learning course you will learn the newest state-of-the-art Deep reinforcement-learning knowledge.
You will do the following
Get state-of-the-art knowledge regarding
Model types
Algorithms and approaches
Function approximation
Deep reinforcement-learning
Deep Multi-agent Reinforcement-learning
Validate your knowledge by answering short and very short quizzes of each lecture.
Be able to complete the course by ~2 hours.
Syllabus
Model types
Markov decision process (MDP)
A discrete-time stochastic control process.
Partially observable Markov decision process (POMDP)
A generalization of MDP in which an agent cannot observe the state.
Decentralized Partially observable Markov decision process (Dec-POMDP)
A generalization of POMDP to consider multiple decentralized agents.
Algorithms and approaches
Bellman equations
A condition for optimality of optimization of dynamic programming.
Model-free
A model-free algorithm is an algorithm which does not use the policy of the MDP.
Off-policy
An off-policy algorithm is an algorithm that use policy 1 for learning and policy 2 for acting in the environment.
Exploration-exploitation
A trade-off in Reinforcement-learning between exploring new policies to use existing policies.
Value-iteration
An iterative algorithm applying bellman optimality backup.
SARSA
An algorithm for learning a Markov decision process policy
Q-learning
A model-free reinforcement learning algorithm to learn the value of an action in a particular state.
Function approximation
Function approximators
The problem asks us to select a function among a well-defined class that closely matches ("approximates") a target function in a task-specific way.
Policy-gradient
Value-based, Policy-based, Actor-critic, policy-gradient, and softmax policy
REINFORCE
A policy-gradient algorithm.
Deep reinforcement-learning
Deep Q-Network (DQN)
A deep reinforcement-learning algorithm using experience reply and fixed Q-targets.
Deep Recurrent Q-Learning (DRQN)
Deep reinforcement-learning algorithm for POMDP extends DQN and uses LSTM.
Optimistic Exploration with Pessimistic Initialization (OPIQ)
A deep reinforcement-learning for MDP based on DQN.
Value Decomposition Networks (VDN)
A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
QMIX
A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
QTRAN
A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
Weighted QMIX
A deep multi-agent reinforcement-learning for Dec-POMDP.
Resources