Reinforcement Learning – Ep. 30 (Deep Learning SIMPLIFIED)

Computers that can play games have always
impressed the computing world. In December 2013, a small group of AI researchers
from a London-based company called DeepMind released a ground-breaking paper called “Playing
Atari with Deep Reinforcement Learning”. And just a little over a month later, Google
announced that they had bought DeepMind for a really big sum of money. Since then, there’s been all kinds of talk
about reinforcement learning in the field of AI. In January of 2016, Google announced that
the appropriately named AlphaGo was able to beat the reigning Go champion of the world. We’re gonna take the mystery out of reinforcement
learning so you can see how all these amazing feats are possible. The story of reinforcement Learning goes all
the way back to AI, animal psychology, and control theory. At the heart of it, it involves an autonomous
agent like a person, animal, robot, or deep net – learning to navigate an uncertain environment
with the goal of maximizing a numerical reward. Sports are a great example of this. Just think of what our autonomous agent would
have to deal with in a tennis match. The agent would have to consider its actions,
like its serves, returns, and volleys. These actions change the state of the game,
or in other words – the current set, the leading player, things like that. And every action is performed with a reward
in mind – winning a point, in order to win the game, set, and match. Our agent needs to follow a policy, or a set
of rules and strategies, in order to maximize the final score. But if you were building an autonomous agent,
how would you actually model this? We know that the agent’s actions will change
the state of the environment. So a model would need to be able to take a
state and an action as input, and generate the maximum expected reward as output. But since that only gets you to the next state,
you’ll need to take into account the total expected reward for every action from the
current till the end state. The way this works will be different for every
application, and you’re probably not surprised to know that building a Tennis agent is different
from building an Atari agent. The researchers at DeepMind used a series
of Atari screenshots to build a convolutional neural network, with a couple of tweaks. The output wasn’t a class, but instead it
was a target number for the maximum reward. So it was actually dealing with regression,
not classification. They also didn’t use pooling layers, since
unlike image recognition, individual positions of game objects, like the player, are all
important and can’t be reduced. A recurrent net could have been used too,
as long as the output layer was tailored for regression, and the input at each time step
included the action and the environment state. There’s also the Deep Q-Network, or DQN
for short. The DQN also uses the principle of predicting
the maximum reward given a state and action. It was actually patented by Google, and it’s
seen a lot of improvements like the Experience Replay and the Dueling Network Architecture. Reinforcement learning isn’t just a fancy,
smart-sounding way to say supervised learning. Supervised learning is all about making sense
of the environment based on historical examples. But that isn’t always the best way to do
things. Imagine if you’re trying to drive a car
in heavy traffic based on the road patterns you observed the week before when the roads
were clear. That’s about as effective as driving when
you’re only looking at the rear view mirror. Reinforcement learning on the other hand is
all about reward. You get points for your actions – like staying
in your lane, driving under the speed limit, signaling when you’re supposed to, things
like that. But you can also lose points for dangerous
actions like tailgating and speeding. Your objective is to get the maximum number
of points possible given the current state of the traffic on the road around you. Reinforcement learning emphasizes that an
action results in a change of the state, which is something a supervised learning model doesn’t
focus on. In April of 2016, Amazon founder Jeff Bezos
talked about how his company is a great place to fail, and how most companies are unwilling
to suffer through “the string of failed experiments”. You can think of this as a statement about
rewards. Most organizations operate in the realm of
conventional wisdom, which is about exploiting what is known to achieve finite rewards with
known odds. Some groups venture into the unknown and explore
new territory with the prospect of out-sized rewards at long odds. And many of these organizations do fail! But some of them succeed and end up changing
the world. With reinforcement learning, an agent can
explore the trade-off between exploration and exploitation, and choose the path to the
maximum expected reward. This channel’s all about Deep Learning,
so we focused on the topic of building a deep reinforcement net. But reinforcement learning falls under the
broader umbrella of artificial intelligence. It involves topics like goal setting, planning,
and perception. And it can even form a bridge between AI and
the engineering disciplines. Reinforcement learning is simple and powerful,
and given the recent advances, it has the potential to become a big force in the field
of Deep Learning. If you wanna learn more about Deep Learning,
hang around after this for our recommendations, or visit us on Facebook and Twitter. Thanks for watching, and we’ll see you next


  1. DeepLearning.TV

    September 15, 2016 at 10:38 pm

    Fun topic to learn and skill to have – enjoy 🙂

  2. Jacob

    September 15, 2016 at 11:24 pm

    How does it actually choose which action to take though? Does it just make a tree of all possible actions and the predicted reward up to a certain depth and then choose the action with the highest reward (similar to minimax)? That seems like it would be very computationally expensive.

  3. Henrique Baqueiro

    December 19, 2016 at 6:20 pm

    I'm still a bit confused how exactly the Atari's screenshots helped the net to make decisions… And also, it was said that it was not a classification problem, but rather a regression problem. Was this topic already covered in any video? Thanks for the videos, I've watched all of them from the very beginning 🙂

  4. 2stefan2000

    December 23, 2016 at 10:26 pm

    And how does this look like in actual code?

  5. Alister

    January 31, 2017 at 1:33 pm

    thank you:)

  6. adaao mascarelli

    March 22, 2017 at 3:16 am

    Do you have something about extrreme learning?

  7. Davide Deon

    April 4, 2017 at 8:08 pm

    Are there gonna be more videos? What happened? I really enjoyed this series!

  8. Laha Ale

    April 20, 2017 at 1:28 pm

    who will give the reward? where is the score come from?

  9. Sharan Duggirala

    April 26, 2017 at 3:41 pm

    Please make more videos, I;m not sure as to why your videos have stopped completely

  10. Yan Meng

    May 21, 2017 at 5:42 am

    Hi, I am a Machine Learning student. I found your videos here explain the concepts and problems very clear. Unfortunately, Youtube is blocked in China. Can I ask you to grant me translate and redistribute your videos in China? Thank you!

  11. Akshay Singh

    May 28, 2017 at 7:25 pm

    we are using deep learning in our project. like intelligent gas sensor. i need a advice which net to choose or which software to use

  12. Jordan Shackelford

    July 12, 2017 at 10:53 pm

    why you stop making videos?

  13. moiz khalid

    July 16, 2017 at 6:27 am

    why you stop making videos???

  14. James Ballari

    July 19, 2017 at 4:37 am

    Great Video!!

    I've created a simple implementation of Reinforcement Learning (Deep Learning ) to train a model to play the game tic tac toe (3×3). it kinda plays with itself learns from the outcome and gets better, I've used tensor flow for this.

    you can find the code at github link given below:

  15. 许建辉

    November 16, 2017 at 3:20 am

    the last video

  16. liutasx

    December 16, 2017 at 11:41 am

    Not clearly explained. I haven't understood that is reinforced learning.

  17. L Royal

    March 1, 2018 at 5:17 am

    Finished watching the series/playlist. Thank you. Your explanation, enunciation, and choice of images/visuals is on point. Keep up the good work. I learnt more from your videos than from my class albeit at a higher level. I hope you'll be rewarded with good ad revenue.

    If possible, please cover other important concepts like SVM, Naive Bayes, probabilistic graphic models etc.. or maybe new series called Machine Learning simplified?

  18. Dmitry Matveyev

    April 7, 2018 at 7:42 am

    So good

  19. Jim konstantakos

    May 15, 2018 at 6:30 pm

    I am in the beginning of writing my thesis which focuses on deep learning, i and i just finished watching all your videos and they were exactly what i wanted. I just wanna say thanks, you really helped guys!

  20. Bayesian Lee

    October 3, 2018 at 8:17 am

    So easy and fun all vids on this channel.

    Im just sad it's been 2 years this channel uploaded.

    Get back to me!!!

  21. Ray VR

    April 16, 2019 at 8:44 pm

    Super good presentation. Thanks.

  22. Anirban Dutta

    July 10, 2019 at 3:59 pm

    beautifully said. It's a loss that your channel stopped.

Leave a Reply