## Information Theory part 10: What is a Markov chain?

When observing the
natural world, many of us notice a somewhat
beautiful dichotomy. No two things are
ever exactly alike, but they all seem to follow
some underlying form. And Plato believed that the
true forms of the universe were hidden from us. Through observation
of the natural world, we could merely acquire
approximate knowledge of them. They were hidden blueprints. The pure forms were
only accessible through abstract reasoning of
philosophy and mathematics. For example, the circle. He describes as that
which has the distance from its circumference to
its center everywhere equal. Yet we will never find
a material manifestation of a perfect circle, or a
perfectly straight line. Though interestingly,
Plato speculated that after an uncountable
number of years, the universe will
reach an ideal state, returning to its perfect form. This platonic focus on
an abstract pure forms remained popular for centuries. And it wasn’t until
the 16th century, when people tried to
embrace the messy variation in the real world, and apply
mathematics to tease out underlying patterns. Bernoulli refined the
idea of expectation. He was focused on a method
of accurately estimating the unknown probability
of some event, based on the number of
times the event occurs in independent trials. And he uses a simple example. Suppose that without
and 2,000 dark pebbles are hidden in an urn. And that, to determine the
ratio of white versus black by experiment, you
draw one pebble after another, with replacement,
and note how many times a white pebble is
drawn versus black. He went on to prove that
the expected value of white versus black observations will
converge on the actual ratio, as the number of
trials increases. Known as the Weak
Law of Large Numbers. And he concluded by saying,
if observations of all events be continued for
the entire infinity, it will be noticed that
everything in the world is governed by precise ratios
and a constant law of change. This idea was quickly
extended, as it was noticed that
not only did things converge on an expected
average, but the probability of variation away
from averages also follow a familiar underlying
shape, or distribution. A great example of this is
Francis Galton’s bean machine. Imagine each collision as
a single independent event, such as a coin flip. And after 10
collisions, or events, the bean falls into a
bucket, representing the ratio of left
versus right deflection. Or heads versus tails. And this overall
curvature, known as a binomial
distribution, appears to be an ideal form, as it
kept appearing everywhere, any time you looked
at the variation of a large number
of random trials. It seems the average
fate of these events is somehow predetermined,
known today as the Central Limit Theorem. But this was a dangerous
philosophical idea to some. And Pavel Nekrasov, originally
a theologian by training, later took up mathematics,
and was a strong proponent of the religious
doctrine of free will. He didn’t like the idea of
us having this predetermined statistical fate. And he made a famous
claim, that independence is a necessary condition for
the law of large numbers. Since independence just
describes these toy examples using beans or dice, where
the outcome of previous events doesn’t change the probability
of the current or future events. However, as we all can
relate, most things in the physical
world are clearly dependent on prior outcomes. Such as the chance of
fire, or sun, or even our life expectancy. And when the probability
of some event depends, or is conditional
on previous events, we say they are dependent
events, or dependent variables. So this claim angered another
Russian mathematician, Andre Markov, who maintained
a very public animosity towards Nekrasov. He goes on to say in a letter
that, this circumstance prompts me to explain, in a
series of articles, that the law of
large numbers can apply to dependent variables. Using a construction,
which he brags, Nekrasov could not
even dream about. So Markov extends Bernoulli’s
results to dependent variables, using an ingenious construction. Imagine a coin flip, which
isn’t independent, but dependent on the previous outcome. So it has a short term
memory of one event. And this can be visualized using
a hypothetical machine, which contains two cups,
which we call states. In one state, we have a 50/50
mix of light versus dark beads. While in the other state, we
have more dark versus light. Now one cup we can
call state zero. It represents a dark
having previously occurred. And the other state,
we can call one, it represents a light bead
having previously occurred. Now to run our machine, we
simply start in a random state, and make a selection. And then we move to
either state zero or one, depending on that event. And based on the outcome
of that selection, we output either a zero, if it’s
dark, or a one, if it’s light. And with this
two-state machine, we can identify four
possible transitions. If we are in state zero,
and a black occurs, we loop back to the same
state and select again. If a light bead is
selected, we jump over to state one, which can
also loop back on itself, or jump back to state
zero, if a dark is chosen. The probability of a light
versus dark selection is clearly not
independent here, since it depends on the previous outcome. But Markov proved that as long
as every state in the machine is reachable, when you run
these machines in a sequence, they reach equilibrium. That is, no matter
where you start, once you begin the
sequence, the number of times you visit
each state converges to some specific
ratio or probability. Now this simple example
disproved Nekrasov’s claim that only independent
events could converge on predictable
distributions. But the concept of
modeling sequences of random events using states,
and transitions between states, became known as a Markov chain. And one of the first, and
most famous applications of Markov chains, was

Thaaaaaanks

2. #### MJ Cruise

Thanks for educating someone over 50!! Fantastic

3. #### D Andersen

This series is really quite brilliant. I'm an Information Science Major and this series ties so much of what I have studied together in one picture. ♥ it

4. #### YumekuiNeru

Awh. It ended at the part that looked really good

5. #### MusicBent

These clif hangers get me every time!

6. #### pebre79

Yes! I've been waiting for markov chains on KA!

7. #### pebre79

that cliffhanger got me gah!! lol

8. #### Chris Liffrig

Wow. And damn u for the abrupt ending

nice one!!

10. #### hookah604

Brilliant as always!

good job

12. #### Purav Patel

7:00 = "Transision matrix" is misspelled. It should be "Transition". Otherwise, I really enjoyed the video!

13. #### MrBumbo90

Nietzsche was wrong.

14. #### MusicBent

I'm already quite familiar with information theory. I only meant to compliment the high quality of the series, both content and presentation.

15. #### Holger K. von Jouanne-Diedrich

Which simulator for the Markov chain did you use at 6:49?

17. #### baby turtle

You guys are awesome! The fact that you do not even have ads in the begining. The world needs more people like you. Briliant work! 🙂

19. #### Nareg Deraveneseian

This blew up my mind! Just Amazing!

superb

21. #### Gego/XAREN ゲゴザレン

And this got us the super compression algorithms LZMA and LZMA2.
🙂

22. #### YuKoNLiFe

What the fuck are you smoking

23. #### Mikey Lin

good background knowledge. thanks for sharing.

24. #### P G

Are you Canadian by any chance ? The way you say "about" or "out"… 😉

25. #### TheResonating

@Art of the Problem so wait what does the binomial distribution have to do with markov chains?

26. #### Jamie Yakscoe

where is a good place to go if i want to learn more about markov chains in protein synthesis? where did you guys get the picture from @7:06 ?

27. #### Snomi94

3:05
"-this overall curvature, known as the binomial distribution…"
The distribution of y/n trials is binomial.
The curves drawn are a normal/Gaussian distribution 🙂
(Which of course the binomial distribution approaches for high n as per the central limit theorem)

28. #### Crayon Chaney

6:17 why the probability of state 1 loop back to state 1 is 0.65? I thought state 1 is 50/50 light versus black.

Awesome explanation….! Thanks for elaborating.

30. #### Cameron Kirk

Wow, superb video. this channel needs more subscribers lol

31. #### vaibhav agarwal

I like how you introduce this, philosophical.

32. #### Xpry

Why does this guy's voice crack so much?

33. #### jpm93

Great video, very interesting, but please lose the creepypasta music. Maybe substitute it for something?

34. #### jpm93

What did you use to construct the animation at 6:15 ? 🙂

35. #### Ben S

These videos are so great! Art indeed.

36. #### Israr ullah

allah has created this world and mohammad(pbu) is the messanger of allah

37. #### Dylan Barth

3.5 minutes in and you've not even touched on what a Markov Chain is. Is this allude to Plato really necessary? It seems tangential at best, and outright misrepresenting his philosophy at worst. And yes the history of how we got to modern probability theory is important, but is it worth going into to describe this one process? I'd like to know why you included so much history that really seems to divert attention away from the topic, maybe I've missed the point entirely.

38. #### jaydev kalivarapu

Lot of material in this video is taken from Khan Academy introduction video. Well explained though.

39. #### PCL

Seriously, man! Excellent vid!

beautiful.

41. #### Martin Milter Bay

Good video. Good mix of historical and mathematical information. Thanks

42. #### Naomi Garcia

This is the best explanatory video of this sort I have seen! Excellent work and thank you for helping our community!!! I will definitely be passing this along.

43. #### Paulo Jose Castro

history repeats itself

44. #### Jorge Olea

Lots of statistical terminology being used
Central limit theorem
Binomial distribution

45. #### A Beaumont

Best video ever!