Google DeepMind’s Deep Q-Learning & Superhuman Atari Gameplays | Two Minute Papers #27

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. This one is going to be huge, certainly one
of my favorites. This work is a combination of several techniques
that we have talked about earlier. If you don’t know some of these terms, it’s perfectly
okay, you can remedy this by clicking on the popups or checking the description box, but
you’ll get the idea even watching only this episode. So, first, we have a convolutional neural
network – this helps processing images and understanding what is depicted on an image. And a reinforcement learning algorithm – this
helps creating strategies, or to be more exact, it decides what the next action we make should
be, what buttons we push on the joystick. So, this technique mixes together these two
concepts, and we call it Deep Q-learning, and it is able to learn to play games the
same way as a human would – it is not exposed to any additional information in the code,
all it sees is the screen and the current score. When it starts learning to play an old game,
Atari breakout, at first, the algorithm loses all of its lives without any signs of intelligent
action. If we wait a bit, it becomes better at playing
the game, roughly matching the skill level of an adept player. But here’s the catch, if we wait for longer,
we get something absolutely spectacular. It finds out that the best way to win the
game is digging a tunnel through the bricks and hit them from behind. I really didn’t
know this, and this is an incredible moment – I can use my computer, this box next to
me that is able to create new knowledge, find out new things I haven’t known before. This
is completely absurd, science fiction is not the future, it is already here. It also plays many other games – the percentages
show the relation of the game scores compared to a human player. Above 70% means that it’s
great, and above 100% it’s superhuman. As a followup work, scientists at deepmind
started experimenting with 3D games, and after a few days of training, it could learn to
drive on ideal racing lines and pass others with ease. I’ve had a driving license for
a while now, but I still don’t always get the ideal racing lines right. Bravo. I have heard the complaint that this is not
real intelligence because it doesn’t know the concept of a ball or what it is exactly
doing. – Edsger Dijkstra once said, “The question of whether machines can think…
is about as relevant as the question of whether submarines can swim.” Beyond the fact that rigorously defining intelligence
leans more into the domain of philosophy than science, I’d like to add that I am perfectly
happy with effective algorithms. We use these techniques to accomplish different tasks,
and they are really good problem solvers. In the breakout game, you, as a person learn
the concept of a ball in order to be able to use this knowledge as a machinery to perform
better. If this is not the case, whoever knows a lot, but can’t use it to achieve anything
useful, is not an intelligent being, but an encyclopedia. What about the future? There are two major
unexplored directions: – the algorithm doesn’t have long-term memory,
and even if it had, it wouldn’t be able to generalize its knowledge to other similar
tasks. Super exciting directions for future work. Thanks for watching and for your generous support, and I’ll see you next time!


  1. Kram1032

    November 22, 2015 at 1:44 pm

    I also love this paper! I'm curious how far they'll be able to push this.

  2. glowies

    November 22, 2015 at 1:46 pm

    Awesome as always!

  3. Erika Furudo

    November 22, 2015 at 2:32 pm

    Is the deep Q-learning program available anywhere?

  4. Jozef SK

    November 22, 2015 at 4:42 pm

    I heard they are going to try this on humanoid robots for bipedal walking.

  5. Peter Henry

    December 31, 2015 at 4:55 pm

    I implement a Forex version at let me know what you think!

  6. tasman_devil

    January 1, 2016 at 8:55 pm

    You didn't know that you can break them from behind??? Eh man, I did that on first try.

  7. zokna

    February 25, 2016 at 9:35 am

    "Whoever knows allot, but can't use it to achieve anything useful, is not an intelligent being but an encyclopaedia."
    — Károly Zsolnai-Fehér

  8. Oldeuboi

    May 15, 2016 at 5:17 pm

    Holy shit, we've just made ourselves obsolete.

  9. isectoid

    June 2, 2016 at 8:24 pm

    Bro, the 'ball behind the bricks' trick is among the oldest in the book.

  10. Mykola Zekter

    October 30, 2016 at 5:27 pm

    You didn't describe how it all worked together, just the results.

  11. Brian

    December 6, 2016 at 3:21 am

    just imagine using this for the stock market, have it research every even in government, business, all that. this could make so much money in the stock market

  12. Jon Wallace

    December 21, 2016 at 2:34 am

    You said in the video that it knows the score. How is that, is it being fed the score or is it watching the screen? If so, it probably has no idea what the score is, only that it's changing.

  13. filipe1silva

    January 11, 2017 at 3:50 pm

    3:19 the ball is behind the bricks before the tunnel was open, WTF?

  14. Martin Dinov

    July 30, 2017 at 8:00 am

    There are actually some other implicit priors (of knowledge) given to the network. When in the last layer of the network all the possible actions are pre-specified, this sets up the problem to be more readily solved. What if we gave the network a great deal more of outputs it could generate (corresponding to different key presses) and it had to learn which keys are also the right ones? This would, in some sense, make this closer to the way humans learn to play these games, no? Though, thus increasing the action space considerably and adding irrelevant actions (that don't do anything) I wonder how well DQN would learn.

  15. Sarcasm

    August 3, 2017 at 11:20 pm

    I wonder if it would start to play Overwatch now, what would be it's main pick 6 months from now.

    I'm guessing Sombra

  16. naisanza

    August 12, 2017 at 11:15 am

    Are different CNN feature extraction layers needed for different games? How would one CNN know what is a positive label from one game is to another?

  17. Bikram Baruah

    August 27, 2017 at 4:44 am

    So the input is just a few thousand snapshots of a human playing the game?

  18. Orian de Wit

    November 11, 2017 at 10:33 pm

    It is incredible that this video is 2 years old, and now we have an AI which beats DOTA2 players at the highest competitive level.

  19. Arthur Minduca

    March 31, 2018 at 12:33 am

    Hi! Great video! I see you use Standard Youtube License instead of Creative Commons. Would you authorize me to show this video on a non-profit presentation at work for others employees that I will do about Machine Learning? Of course proper attribution would be given. Thanks!

  20. Rooster HAX

    July 31, 2018 at 9:02 pm

    True AGI!!!!1

Leave a Reply