Information Theory part 10: What is a Markov chain?

When observing the
natural world, many of us notice a somewhat
beautiful dichotomy. No two things are
ever exactly alike, but they all seem to follow
some underlying form. And Plato believed that the
true forms of the universe were hidden from us. Through observation
of the natural world, we could merely acquire
approximate knowledge of them. They were hidden blueprints. The pure forms were
only accessible through abstract reasoning of
philosophy and mathematics. For example, the circle. He describes as that
which has the distance from its circumference to
its center everywhere equal. Yet we will never find
a material manifestation of a perfect circle, or a
perfectly straight line. Though interestingly,
Plato speculated that after an uncountable
number of years, the universe will
reach an ideal state, returning to its perfect form. This platonic focus on
an abstract pure forms remained popular for centuries. And it wasn’t until
the 16th century, when people tried to
embrace the messy variation in the real world, and apply
mathematics to tease out underlying patterns. Bernoulli refined the
idea of expectation. He was focused on a method
of accurately estimating the unknown probability
of some event, based on the number of
times the event occurs in independent trials. And he uses a simple example. Suppose that without
your knowledge 3,000 light pebbles
and 2,000 dark pebbles are hidden in an urn. And that, to determine the
ratio of white versus black by experiment, you
draw one pebble after another, with replacement,
and note how many times a white pebble is
drawn versus black. He went on to prove that
the expected value of white versus black observations will
converge on the actual ratio, as the number of
trials increases. Known as the Weak
Law of Large Numbers. And he concluded by saying,
if observations of all events be continued for
the entire infinity, it will be noticed that
everything in the world is governed by precise ratios
and a constant law of change. This idea was quickly
extended, as it was noticed that
not only did things converge on an expected
average, but the probability of variation away
from averages also follow a familiar underlying
shape, or distribution. A great example of this is
Francis Galton’s bean machine. Imagine each collision as
a single independent event, such as a coin flip. And after 10
collisions, or events, the bean falls into a
bucket, representing the ratio of left
versus right deflection. Or heads versus tails. And this overall
curvature, known as a binomial
distribution, appears to be an ideal form, as it
kept appearing everywhere, any time you looked
at the variation of a large number
of random trials. It seems the average
fate of these events is somehow predetermined,
known today as the Central Limit Theorem. But this was a dangerous
philosophical idea to some. And Pavel Nekrasov, originally
a theologian by training, later took up mathematics,
and was a strong proponent of the religious
doctrine of free will. He didn’t like the idea of
us having this predetermined statistical fate. And he made a famous
claim, that independence is a necessary condition for
the law of large numbers. Since independence just
describes these toy examples using beans or dice, where
the outcome of previous events doesn’t change the probability
of the current or future events. However, as we all can
relate, most things in the physical
world are clearly dependent on prior outcomes. Such as the chance of
fire, or sun, or even our life expectancy. And when the probability
of some event depends, or is conditional
on previous events, we say they are dependent
events, or dependent variables. So this claim angered another
Russian mathematician, Andre Markov, who maintained
a very public animosity towards Nekrasov. He goes on to say in a letter
that, this circumstance prompts me to explain, in a
series of articles, that the law of
large numbers can apply to dependent variables. Using a construction,
which he brags, Nekrasov could not
even dream about. So Markov extends Bernoulli’s
results to dependent variables, using an ingenious construction. Imagine a coin flip, which
isn’t independent, but dependent on the previous outcome. So it has a short term
memory of one event. And this can be visualized using
a hypothetical machine, which contains two cups,
which we call states. In one state, we have a 50/50
mix of light versus dark beads. While in the other state, we
have more dark versus light. Now one cup we can
call state zero. It represents a dark
having previously occurred. And the other state,
we can call one, it represents a light bead
having previously occurred. Now to run our machine, we
simply start in a random state, and make a selection. And then we move to
either state zero or one, depending on that event. And based on the outcome
of that selection, we output either a zero, if it’s
dark, or a one, if it’s light. And with this
two-state machine, we can identify four
possible transitions. If we are in state zero,
and a black occurs, we loop back to the same
state and select again. If a light bead is
selected, we jump over to state one, which can
also loop back on itself, or jump back to state
zero, if a dark is chosen. The probability of a light
versus dark selection is clearly not
independent here, since it depends on the previous outcome. But Markov proved that as long
as every state in the machine is reachable, when you run
these machines in a sequence, they reach equilibrium. That is, no matter
where you start, once you begin the
sequence, the number of times you visit
each state converges to some specific
ratio or probability. Now this simple example
disproved Nekrasov’s claim that only independent
events could converge on predictable
distributions. But the concept of
modeling sequences of random events using states,
and transitions between states, became known as a Markov chain. And one of the first, and
most famous applications of Markov chains, was
published by Claude Shannon.


  1. MrBumbo90

    June 26, 2013 at 11:47 pm


  2. MJ Cruise

    June 27, 2013 at 1:04 am

    Thanks for educating someone over 50!! Fantastic

  3. D Andersen

    June 27, 2013 at 1:37 am

    This series is really quite brilliant. I'm an Information Science Major and this series ties so much of what I have studied together in one picture. ♥ it

  4. YumekuiNeru

    June 27, 2013 at 2:02 am

    Awh. It ended at the part that looked really good

  5. MusicBent

    June 27, 2013 at 3:38 am

    These clif hangers get me every time!

  6. pebre79

    June 27, 2013 at 4:26 am

    Yes! I've been waiting for markov chains on KA!

  7. pebre79

    June 27, 2013 at 4:33 am

    that cliffhanger got me gah!! lol

  8. Chris Liffrig

    June 27, 2013 at 4:52 am

    Wow. And damn u for the abrupt ending

  9. tehcno007

    June 27, 2013 at 6:19 am

    nice one!!

  10. hookah604

    June 27, 2013 at 6:37 am

    Brilliant as always!

  11. Ashok Raju

    June 27, 2013 at 9:30 am

    good job

  12. Purav Patel

    June 27, 2013 at 3:15 pm

    7:00 = "Transision matrix" is misspelled. It should be "Transition". Otherwise, I really enjoyed the video!

  13. MrBumbo90

    June 27, 2013 at 3:24 pm

    Nietzsche was wrong.

  14. MusicBent

    June 28, 2013 at 9:50 pm

    I'm already quite familiar with information theory. I only meant to compliment the high quality of the series, both content and presentation.

  15. Holger K. von Jouanne-Diedrich

    July 1, 2013 at 4:17 pm

    Which simulator for the Markov chain did you use at 6:49?


    August 25, 2013 at 4:01 pm

    DUDE your the guy that started the BITCH PLEASE meme!

  17. Saad Saeed

    November 1, 2013 at 12:02 am

    You guys are awesome! The fact that you do not even have ads in the begining. The world needs more people like you. Briliant work! 🙂

  18. Nareg Deraveneseian

    November 6, 2013 at 5:05 am

    This blew up my mind! Just Amazing!

  19. Gego/XAREN ゲゴザレン

    December 7, 2013 at 8:19 pm

    And this got us the super compression algorithms LZMA and LZMA2.

  20. YuKoNLiFe

    March 1, 2014 at 11:24 am

    What the fuck are you smoking

  21. Mikey Lin

    November 26, 2014 at 4:39 pm

    good background knowledge. thanks for sharing.

  22. P G

    March 27, 2015 at 9:26 pm

    Are you Canadian by any chance ? The way you say "about" or "out"… 😉

  23. TheResonating

    May 11, 2015 at 3:46 am

    @Art of the Problem so wait what does the binomial distribution have to do with markov chains?

  24. Jamie Yakscoe

    December 17, 2015 at 2:29 pm

    where is a good place to go if i want to learn more about markov chains in protein synthesis? where did you guys get the picture from @7:06 ?

  25. Snomi94

    April 11, 2016 at 6:47 pm

    "-this overall curvature, known as the binomial distribution…"
    The distribution of y/n trials is binomial.
    The curves drawn are a normal/Gaussian distribution 🙂
    (Which of course the binomial distribution approaches for high n as per the central limit theorem)

  26. Crayon Chaney

    November 17, 2016 at 1:58 pm

    6:17 why the probability of state 1 loop back to state 1 is 0.65? I thought state 1 is 50/50 light versus black.

  27. Chinmay Upadhye

    November 21, 2016 at 7:23 pm

    Awesome explanation….! Thanks for elaborating.

  28. Cameron Kirk

    January 4, 2017 at 3:26 am

    Wow, superb video. this channel needs more subscribers lol

  29. vaibhav agarwal

    January 29, 2017 at 2:44 pm

    I like how you introduce this, philosophical.

  30. Xpry

    January 30, 2017 at 1:21 am

    Why does this guy's voice crack so much?

  31. jpm93

    February 9, 2017 at 3:50 am

    Great video, very interesting, but please lose the creepypasta music. Maybe substitute it for something?

  32. jpm93

    February 9, 2017 at 3:53 am

    What did you use to construct the animation at 6:15 ? 🙂

  33. Ben S

    February 21, 2017 at 4:10 pm

    These videos are so great! Art indeed.

  34. Israr ullah

    April 18, 2017 at 3:05 pm

    allah has created this world and mohammad(pbu) is the messanger of allah

  35. Dylan Barth

    June 12, 2017 at 6:37 pm

    3.5 minutes in and you've not even touched on what a Markov Chain is. Is this allude to Plato really necessary? It seems tangential at best, and outright misrepresenting his philosophy at worst. And yes the history of how we got to modern probability theory is important, but is it worth going into to describe this one process? I'd like to know why you included so much history that really seems to divert attention away from the topic, maybe I've missed the point entirely.

  36. jaydev kalivarapu

    August 11, 2017 at 12:06 pm

    Lot of material in this video is taken from Khan Academy introduction video. Well explained though.

  37. PCL

    August 15, 2017 at 2:35 pm

    Seriously, man! Excellent vid!

  38. Ujjwal Garg

    September 14, 2017 at 11:46 am


  39. Martin Milter Bay

    October 17, 2017 at 5:11 am

    Good video. Good mix of historical and mathematical information. Thanks

  40. Naomi Garcia

    January 25, 2018 at 12:04 am

    This is the best explanatory video of this sort I have seen! Excellent work and thank you for helping our community!!! I will definitely be passing this along.

  41. Paulo Jose Castro

    March 4, 2018 at 9:07 am

    history repeats itself

  42. Jorge Olea

    May 8, 2018 at 7:21 pm

    Lots of statistical terminology being used
    Central limit theorem
    Binomial distribution

  43. A Beaumont

    November 16, 2018 at 4:55 pm

    Best video ever!

  44. umer Arshad

    December 11, 2018 at 11:00 am

    best video i have seen

Leave a Reply