How to use Q Learning in Video Games Easily

Hello World! Its Siraj point A to point B using a special type
of reinforcement learning called Q learning reinforcement learning means
learning by interacting with an environment through positive feedback or
reinforcement it’s similar to how you give a dog a treat but only if it rolls
over and it’s evolved over the past few decades in the late 1950s an American
mathematician named Richard bellman was trying to solve what he called the
optimal control problem this describes the problem of designing an agent to
minimize some behavior of a system over time eventually he and his colleagues
finally discovered a possible solution to it which was later called the bellman
equation it describes the value of a problem at a certain point in time in
terms of the payoff made by previous decisions and it also describes the
value of the remaining decision problems that result from the initial decisions
by involving a systems various states in this way it broke the problem down into
simpler subproblems the bellman equation is now used in many many fields it helps
with minimizing flight time for airplanes maximizing profits for hedge
funds minimizing the beef that soldier boy seems to have with everyone yeah
John episode no no no no no no Suraj it wasn’t b1 though bellman Squad was
making waves throughout the math community but meanwhile a psychologist
named Edward Thorndike was trying to understand how learning works by
studying the animal kingdom he came up with what he called the law of effect
which states responses that produce a satisfying effect in a particular
situation become more likely to occur again in that situation and responses
that produce a discomforting effect become less likely to occur again in
that situation thanks Captain Obvious now is actually pretty important
discovery one of his experiments was putting a cat in a wooden box and
observing it while it tried a bunch of different ways of getting out until it
finally hit the lever that opened the box when he put the cap back in the box
it immediately knew to hit the lever to get out and it was able to do that
because of the process of trial and error which is what Thorndike was
essentially describing in the law of effect a couple decades later a British
computer scientist named Chris Watkins thought that perhaps these two ideas
could be combined to create a new type of learning algorithm the idea of
designing an agent that minimizes some behavior of a system over time like the
bellman equation and does through the process of trial and error
similar to the law of effect and so he invented a novel reinforcement learning
technique he called q-learning so what is this let’s say we had five
rooms in a building connected by doors and we’ll just think of everything
outside of the building as one big room all of space-time is room five we can
think of this system as a graph each room is a node and each door is a link
like room one has doors to both room five and three so they’re connected our
goal is to put an agent in any room and for it to learn how to get to room five
through trial and error so the goal room is room five the set room five is a goal
we can associate a reward value with each door which is the link between
nodes so doors that lead immediately to the goal room get an instant reward of
100 doors not directly connected to our goal room gets zero reward in q-learning
the goal is to reach the state with the highest reward through a set of actions
so if each room is a state each action is represented by an arrow and the
mapping of state to action is the agents policy it uses the reward value as a
signal to improve its policy over time and it stores what it has learned
through experience and what’s called the Q matrix the rows represent the possible
states and the columns are possible actions leading to the next state it
updates the Q matrix over time as it learns the best actions to maximize the
reward seems pretty useful right shoe learnings got to be used everywhere in
video games right consumer video game BOTS need to be good but not so good
that a human couldn’t beat them the bots that used Q learning to master games
like chess and checkers and most recently Atari games become insanely
good at whatever they play academic AI learns while consumer game AI generally
just makes educated guesses it doesn’t really learn and its actions are all
scripted but as different as they are the two fields are converging as we
discover more about machine learning for example in Forza Motorsport you can
create a drive ATAR for yourself it’s an AI that learns how you drive by
observing you and can then imitate your driving style having adapted behavior
like this will make games more interesting and there’s a lot of
potential for more of it so let’s write out a 10 line high-level Python script
that uses cue learning to train a bot to get from point A to point B this game is
a 5 by 5 grid our agent is a yellow square and the goal is to find its way
to the green square or the Red Square to end the game each cell represents a
state the agent can be in and there are 4 actions up down left and right moving
a step will give us a reward of neck point zero for the red cell gives us
negative one and the green cell gives us positive one so we ideally want to get
to the green cell every time the game world is already built for us so we’ll
just start off by importing that at the top then in our main function we can go
ahead and create a while statement set to true because we want our agent to run
indefinitely next we’ll initialize our bots position
from a world class and set it to the agent variable so now we want our bot to
pick the right action to take in the game world and the question is how do we
decide that I got a grid of squares measured five by
five and I’m gonna get to grade in one piece alive I’m gonna make a cube matrix
initialize keeping every single reward AB line archived then I’ma pick an
action straight how to queue then I’m gonna do it yeah you just fresh and new
update queue width reward on B and once I got that I’ll go ahead and repeat yeah
we’ll use our box position as a parameter for the max Q function which
we’ll choose an action from our queue matrix as well as a potential reward
then we can perform that action by inputting the action as a parameter to
the do action method which will return our bot the action we took the reward we
received and the updated bots position after taking the action now we’re ready
to update our Q matrix so we’ll use the updated BOTS position as the parameter
we’ll print out both parameters to terminal so we can observe the results
we’ll run the script by typing in Python Lerner PI into terminal and it’ll pop up
as a GUI the bot will immediately start trying out possible paths to get to the
green one and we can observe the score in terminal improving over time this bot
in particular gets really good really fast like in 10 seconds it’s already
found the ideal path and is just going to keep doing it so to break it down
reinforcement learning is the process of learning by interacting with an
environment through positive feedback cue learning is a type of reinforcement
learning that minimizes the behavior of a system over time through trial and
error and it does this by updating its policy which is a mapping of state to
action based on a reward the coding challenge for this video is to modify
this code so that the game world is bigger and has more obstacles let’s make
it harder for our Q learning BOTS to find the optimal strategy details are in
next video for now I’ve got to optimize my life so thanks for watching

1. Joseph K J

Hey Siraj, whats the editor that you are using?

3. Mohamed Mnete

By the way I was thinking on creating a tic tac toe game using keras to do this q learning.

4. Hitesh Vaidya

Traceback (most recent call last):
File "Learner.py", line 23, in <module>
Q[(i, j)][action] = w
KeyError: (4, 1)

This is what I get after running the program shared by siraj via github link. Can anyone help?

5. Pranav Gaur

Hi @Siraj Rawal,

Great tutorial and it surely helped me a lot. I was wondering how this approach can be used for a continuous state space(if that's the right term to use)? For eg. A race car finding its way around a track.
If possible, can you do a tutorial on that too? 😛

6. Sam Arnold

Dude Siraj! you are the man! thank you for your videos!

8. Matthew Felgate

Nice video. Is it correct to call it Reward? Isn't it both Rewarded and Punished. Like both a Carrot AND Stick method of learning?

9. Matthew Felgate

Whats the reasoning behind making the value of walk_reward as -0.04? I mean why was this value chosen? Could you write something to optimise this number?

10. Anal Khan

hey can u plz make a video for sarsa algorithm.

11. qiman wang

Markov chains

12. qiman wang

Markov chains

13. Lloyd Sloan

What IDE do you use for python?

14. danny

Siraj, your eyebrows are amazing. And I think learning rate selection can be really important depending on your application of the decision process.

15. BK

At first I was annoyed by the level of excitement, but then I realized it made it way easier to pay attention to than my lectures. Thanks for that extra effort!

16. 张航

Rather good. Thank you for sharing!

17. Kumar Developer

Started AI and ML Recently. Thank God for finding you guys soon. I do have some queries to you. How could I contact you ? Any Forums ? How could I start with Q Learning ? What is the basic environmental Steps needed ?

18. swag gamer

can you make an appnana bot

Man, this is just amazing. Every single topic of every single subject must be like this… fun, interactive, fascinating.

20. Aidan Gomez

this seems like a markov chain

21. Abhinav a.k.a Demkeys

You're a fucking legend!!! Awesome video!

22. C C

Seem wrong Image at 3:15, the image is R matrix no a Q matrix

23. Frenchpet

Can you do one for Atari games?

24. WildLachii

put speed to 0.75 and thank me later

25. Azhar Khan

Machine learning with the rap!
Now I have seen everything 🙂
Very innovative Siraj. I enjoy your videos.

26. Gautham Venkataraman

this guy is underrated asf

27. hamtsammich

Hey Siraj, how would you go about recreating the image generation used in https://youtu.be/-wmtsTuHkt0 ?

28. dLoLe

Entertaining and fun. Not really useful if you actually want to understand it and learn something

29. Amber 180

You legit have a cool science show! Wait, did you just rap??

30. Arul Kajan

Dear Siraj
can i have your contact detail

31. Daniel Astillero

can't really argue w/ that sick rappin'

32. APB

is that possible with windows 10?

33. James Ballari

Hi raj, great video as always 🙂 I wrote my self with the help of the internet a python code to implement Tic Tac Toe game with Reinforcement Learning ( Q- Learning), it learns to play the game by playing with itself and learning from the outcomes and gets better at it.
Here's the github link for the same:
https://github.com/jamesq9/Tic-Tac-Toe-Machine-Learning-Using-Reinforcement-Learning

34. dLoLe

There are a lot of videos on youtube that explain this topic really well but this is not one of them. It doesn't really help you understand the idea behind it but only briefly outlines it with a few buzzwords and makes the thing seem more complicated than it really is

I don't want to spread negativity but I sincerely believe that videos are just ckickbait with some buzzwords mixed with some forced jokes without any useful information that you could use to really learn something.

35. A Achyut

There's Physics(graduate level) on the blackboard not math

36. edantonio505

Great video! thanks!

37. Dong Wang

Thanks Siraj for this series of awesome tutorial, the rap is great, I don't know you have the talent of rap before, go for American Talents!

38. Harshit Agarwal

at 3.16, he should be showing Q matrix but he's showing R matrix

39. oshaho rana

plz speak slowly, that non native english can get ur word

I'm too late but here's my challenge's code:
https://github.com/McOussKing/q_learning_demo

41. Fast Solution

where to learn A.I from scratch for my unity games

so awesome

43. Eduardo Martín

holy moly man, please…you can teach a bit slower… xD good job anyways

44. micsodaki

I still don't understand how can we set the Q matrix in case of e.g. games where we have no idea whether an action is good or not.

45. MonmonstarTV

This dude is probably the most cringy tutor I have seen, but he explains everything very well

46. Amairani Coba

Hi siraj, I have a question. This helps to get from point A to B. Now let's say I want to cover each space in the grid as my goal, is qlearning a way to do it? Or what do u Suggest? Great videos. 😊😊

47. Jordan Shackelford

Teach me how to optimize my life pleasee

48. William John

print "create siraj! you have points to assign to strength, health, wisdom, and dexterity."
name=raw_input("Siraj Hair ")
points=30
attributes=("python", "tensorflow", "neural network", "iq")
python=0
Tensorflow =0
neural network=0
iq=180
while True:
print
print "you have", points, "points left."
print
"""
2-take points
3-see points per attribute
4-exit
"""
choice=raw_input("bla bla bla ")
if choice=="1":
attribute=raw_input("Siraj is bored ")
if attribute in attributes:
if Siraj=="Python":
print name, "now has", strength, "Blabla points."
elif attribute=="health":
print name, "now has", health, "health points."
elif attribute=="wisdom":
print name, "now has", wisdom, "wisdom points."
elif attribute=="dexterity":
print name, "now has", dexterity, "dexterity points."
else:
print "invalid number of points."
else:
print "invalid attribute."
elif choice=="2":
attribute=raw_input("which attribute? strength, health, wisdom, or dexterity? ")
if attribute in attributes:
take=int(raw_input("how many points? "))
if attribute=="strength" and take<=strength and take>0:
strength-=take
print name, "now has", strength, "strength points."
points+=take
elif attribute=="health" and take<=health and take>0:
health-=take
print name, "now has", health, "health points."
points+=take
elif attribute=="wisdom" and take<=wisdom and take>0:
wisdom-=take
print name, "now has", wisdom, "wisdom points."
points+=take
elif attribute=="dexterity" and take<=dexterity and take>0:
dexterity-=take
print name, "now has", dexterity, "dexterity points."
points+=take
else:
print "invalid number of points."
else:
print "invalid attribute."
elif choice=="3":
print "strength -", strength
print "health -", health
print "wisdom -", wisdom
print "dexterity -", dexterity
elif choice=="4":
if points==0:
break
else:
else:
print "invalid choice."
print "congrats! you're done designing "+name+'.'
print name, "has", strength, "strength points,", health, "health points,", wisdom, "wisdom points, and", dexterity, "dexterity points."

49. Bob dabiuld

The pictures (and some content) are literally copied from https://www.cs.cmu.edu/~epxing/Class/10701/slides/lecture21.pdf

50. VibeX MaTxeMa

thanks . it worked.. great help!!!!

51. Kinos141

Q learning looks like GOAP AI. I could be wrong.

hii Siraj,

I have recently working on a project called raspberry Cluster computer,
I have built a small 6 node raspberry, can i use that cluster for q learning to make it learn Mario game.

thank you.

53. Lolpop HD

I cant do the import world pls help

54. Mekaveli.

I think you should stop using words like 'ez' because it's not , especially for beginners.

55. gsrini

how did you chose the reward as -0.1

the corn is strong in this one

57. Winston Mcgee

will you stop adding unnecessary things you think you're funny? just get to the point you fucking indian

58. David M. Johnston

Can I give you an opinion?
But to me it seems like there's a conflict in the audience you target. On the one hand, you're speaking to developers who like to think in a structured and very logical way, but on the other hand, your quick paced style of editing, intercut with funny clips, seems made to attract passive viewers who only stay here because it moves a lot but don't really listen.
This is fine, but at the same time it can be distracting for the focused viewer who really tries to deeply understand the core concepts. I'm just pointing this out because sometimes it's hard to pinpoint what is good and what is bad from the inside, and an outsider's opinion can help.

59. Jack Currie

Siraj, your videos are so excellent that they have driven me to make my first ever comment on Youtube. Just wanted to tell you to keep it up! You strike the perfect balance between between helpfulness and entertainment. And don't listen to the haters, keep rapping 🙂

60. xXSillyBilly79Xx

Cool information but the platform makes me cringe.

61. Paul Ahrenholtz

5:14
i ain't ran from nothing but the loss funct…
yea..
never running from nothing but the loss funct..

NORF NORF

62. Luciano Martinez

He's so excited omg

63. Bram Dhont

I thought everything was funny and cool but than comes the rapping

64. Ashutosh pandey

Bro you are simply awsome with lots of stuffs you came up with,presentation style and everything is incredible!!!!! ….

65. Dane Lee

hhhhh I'm speechless at this video, awesome!

66. Tristan W

I've never seen someone gesticulate so much.

67. Vinil Vasani

I'm a novice in the field of machine learning and have some experience with different frameworks. I have a simple project in mind that I want to executed. I want to use machine learning to set signal light timers (especially green light duration) based on the number of cars (for a 4-way junction ). Basically a dynamic traffic light which will set a dynamic timer for each red light based on the number of cars at that red light. What kind of model should I use for this and a link to relevant resources will be very useful as well.

68. Zylar Gray

You…. missed the perfect opportunity to *insert Christopher Walken meme here*…

69. Savior NT

When the cat was in the box, and it was not being observed, was the cat dead, alive, or did it exist in both states simultaneously?

70. GameCasters

are there any q learning videos for 3d video games in android? that's what I'm interested in tbh

71. GameCasters

and i havent found anything on youtube talking about q learning for android games

http://mnemstudio.org/path-finding-q-learning-tutorial.htm
This might help!

73. Paulo Jose Castro

meh why is markov didn't mention?
also made a blog for different approach of qlearn:
https://planktonfun.github.io/q-learning-js/

74. 1Rennae2

Is this what is going on in the chans…as in 4chan & 8chan? Sure sounds like it to me. 🙂

75. Saeed Ahmed

As a Beginner, I found it useful… I will continue to learn from here…

76. 16hbm

Could u please do a video on Luna Lander implementing q -learning??

77. Rami M

how can anybody understand this!!

78. Сергей Лютько

Just awesome!

79. Vu Coc

It's great! I am just a student first years and I am trying to studing IT. Your video make my problem easier. I hope you to make a lot of video about IT, learning! (Y) Better luck in the future

82. BlindfoldChess

Yo ! RL in golang.

83. Thiago Araujo

This is just awesome hahaha

hey can anyone tell me where can i find the whole package of the demo? If it is on GitHub please send me the link bros ! thanks

85. paul antony

How do we apply q learning for a turn based game(games like tic tac toe,chess etc) does max(s',a') be max q value in opponents turn

86. Sukanya Murali

I installed the "World"… But still I m getting the error as ModuleNotFoundError: no module named 'World'.

87. Ahm Zahid Hossain

I am using anaconda 3 python env, was not able to install TKinter. Tried all different ways. If anyone can help me …

88. lemon274

Im sad I can't seem to import the World library after pip installing it :

89. Jan michael Besinga

I set the playback speed to 0.75 since he started rapping.

How to make, ai learn pubg😋😋

91. Eshaan Barkataki

Thx, Great Video. First youtube programmer I saw that uses meme pics. Appreciate that. 🙂

(Ps: thx for putting the source code too. Helped a lot)

92. Nuno Soares

Cat: It might be this lever (translated from meow)
Person: puts back on the box
Cat: I HAVE NO CLUE (translated from meow)

93. kenivia

THAT OUTRO WAS SMOOTH

94. Ricardo Lerch

why does it always choose the same path, being others with the same reward?

95. Roland Lawrence

watching this video series has changed my life 🙂 amazing stuff! coming from an old SQL programmer…

96. John Nesbit

the code is so high level it made things more confusing until I read a lower level script plase stp wit da high levl

okay… what?

98. aryan chauhan

This is so cool whenever some visual action is there I am in

99. Tarvis

why are there only 5 states in the final code if there is 25 possible states?

cringe