## Information Theory part 10: What is a Markov chain?

When observing the

natural world, many of us notice a somewhat

beautiful dichotomy. No two things are

ever exactly alike, but they all seem to follow

some underlying form. And Plato believed that the

true forms of the universe were hidden from us. Through observation

of the natural world, we could merely acquire

approximate knowledge of them. They were hidden blueprints. The pure forms were

only accessible through abstract reasoning of

philosophy and mathematics. For example, the circle. He describes as that

which has the distance from its circumference to

its center everywhere equal. Yet we will never find

a material manifestation of a perfect circle, or a

perfectly straight line. Though interestingly,

Plato speculated that after an uncountable

number of years, the universe will

reach an ideal state, returning to its perfect form. This platonic focus on

an abstract pure forms remained popular for centuries. And it wasn’t until

the 16th century, when people tried to

embrace the messy variation in the real world, and apply

mathematics to tease out underlying patterns. Bernoulli refined the

idea of expectation. He was focused on a method

of accurately estimating the unknown probability

of some event, based on the number of

times the event occurs in independent trials. And he uses a simple example. Suppose that without

your knowledge 3,000 light pebbles

and 2,000 dark pebbles are hidden in an urn. And that, to determine the

ratio of white versus black by experiment, you

draw one pebble after another, with replacement,

and note how many times a white pebble is

drawn versus black. He went on to prove that

the expected value of white versus black observations will

converge on the actual ratio, as the number of

trials increases. Known as the Weak

Law of Large Numbers. And he concluded by saying,

if observations of all events be continued for

the entire infinity, it will be noticed that

everything in the world is governed by precise ratios

and a constant law of change. This idea was quickly

extended, as it was noticed that

not only did things converge on an expected

average, but the probability of variation away

from averages also follow a familiar underlying

shape, or distribution. A great example of this is

Francis Galton’s bean machine. Imagine each collision as

a single independent event, such as a coin flip. And after 10

collisions, or events, the bean falls into a

bucket, representing the ratio of left

versus right deflection. Or heads versus tails. And this overall

curvature, known as a binomial

distribution, appears to be an ideal form, as it

kept appearing everywhere, any time you looked

at the variation of a large number

of random trials. It seems the average

fate of these events is somehow predetermined,

known today as the Central Limit Theorem. But this was a dangerous

philosophical idea to some. And Pavel Nekrasov, originally

a theologian by training, later took up mathematics,

and was a strong proponent of the religious

doctrine of free will. He didn’t like the idea of

us having this predetermined statistical fate. And he made a famous

claim, that independence is a necessary condition for

the law of large numbers. Since independence just

describes these toy examples using beans or dice, where

the outcome of previous events doesn’t change the probability

of the current or future events. However, as we all can

relate, most things in the physical

world are clearly dependent on prior outcomes. Such as the chance of

fire, or sun, or even our life expectancy. And when the probability

of some event depends, or is conditional

on previous events, we say they are dependent

events, or dependent variables. So this claim angered another

Russian mathematician, Andre Markov, who maintained

a very public animosity towards Nekrasov. He goes on to say in a letter

that, this circumstance prompts me to explain, in a

series of articles, that the law of

large numbers can apply to dependent variables. Using a construction,

which he brags, Nekrasov could not

even dream about. So Markov extends Bernoulli’s

results to dependent variables, using an ingenious construction. Imagine a coin flip, which

isn’t independent, but dependent on the previous outcome. So it has a short term

memory of one event. And this can be visualized using

a hypothetical machine, which contains two cups,

which we call states. In one state, we have a 50/50

mix of light versus dark beads. While in the other state, we

have more dark versus light. Now one cup we can

call state zero. It represents a dark

having previously occurred. And the other state,

we can call one, it represents a light bead

having previously occurred. Now to run our machine, we

simply start in a random state, and make a selection. And then we move to

either state zero or one, depending on that event. And based on the outcome

of that selection, we output either a zero, if it’s

dark, or a one, if it’s light. And with this

two-state machine, we can identify four

possible transitions. If we are in state zero,

and a black occurs, we loop back to the same

state and select again. If a light bead is

selected, we jump over to state one, which can

also loop back on itself, or jump back to state

zero, if a dark is chosen. The probability of a light

versus dark selection is clearly not

independent here, since it depends on the previous outcome. But Markov proved that as long

as every state in the machine is reachable, when you run

these machines in a sequence, they reach equilibrium. That is, no matter

where you start, once you begin the

sequence, the number of times you visit

each state converges to some specific

ratio or probability. Now this simple example

disproved Nekrasov’s claim that only independent

events could converge on predictable

distributions. But the concept of

modeling sequences of random events using states,

and transitions between states, became known as a Markov chain. And one of the first, and

most famous applications of Markov chains, was

published by Claude Shannon.

## MrBumbo90

June 26, 2013 at 11:47 pmThaaaaaanks

## MJ Cruise

June 27, 2013 at 1:04 amThanks for educating someone over 50!! Fantastic

## D Andersen

June 27, 2013 at 1:37 amThis series is really quite brilliant. I'm an Information Science Major and this series ties so much of what I have studied together in one picture. ♥ it

## YumekuiNeru

June 27, 2013 at 2:02 amAwh. It ended at the part that looked really good

## MusicBent

June 27, 2013 at 3:38 amThese clif hangers get me every time!

## pebre79

June 27, 2013 at 4:26 amYes! I've been waiting for markov chains on KA!

## pebre79

June 27, 2013 at 4:33 amthat cliffhanger got me gah!! lol

## Chris Liffrig

June 27, 2013 at 4:52 amWow. And damn u for the abrupt ending

## tehcno007

June 27, 2013 at 6:19 amnice one!!

## hookah604

June 27, 2013 at 6:37 amBrilliant as always!

## Ashok Raju

June 27, 2013 at 9:30 amgood job

## Purav Patel

June 27, 2013 at 3:15 pm7:00 = "Transision matrix" is misspelled. It should be "Transition". Otherwise, I really enjoyed the video!

## MrBumbo90

June 27, 2013 at 3:24 pmNietzsche was wrong.

## MusicBent

June 28, 2013 at 9:50 pmI'm already quite familiar with information theory. I only meant to compliment the high quality of the series, both content and presentation.

## Holger K. von Jouanne-Diedrich

July 1, 2013 at 4:17 pmWhich simulator for the Markov chain did you use at 6:49?

## HARISH PATEL

August 25, 2013 at 4:01 pmDUDE your the guy that started the BITCH PLEASE meme!

## baby turtle

October 10, 2013 at 12:46 pmabout?

## Saad Saeed

November 1, 2013 at 12:02 amYou guys are awesome! The fact that you do not even have ads in the begining. The world needs more people like you. Briliant work! 🙂

## Nareg Deraveneseian

November 6, 2013 at 5:05 amThis blew up my mind! Just Amazing!

## viralshield

November 30, 2013 at 2:28 amsuperb

## Gego/XAREN ゲゴザレン

December 7, 2013 at 8:19 pmAnd this got us the super compression algorithms LZMA and LZMA2.

🙂

## YuKoNLiFe

March 1, 2014 at 11:24 amWhat the fuck are you smoking

## Mikey Lin

November 26, 2014 at 4:39 pmgood background knowledge. thanks for sharing.

## P G

March 27, 2015 at 9:26 pmAre you Canadian by any chance ? The way you say "about" or "out"… 😉

## TheResonating

May 11, 2015 at 3:46 am@Art of the Problem so wait what does the binomial distribution have to do with markov chains?

## Jamie Yakscoe

December 17, 2015 at 2:29 pmwhere is a good place to go if i want to learn more about markov chains in protein synthesis? where did you guys get the picture from @7:06 ?

## Snomi94

April 11, 2016 at 6:47 pm3:05

"-this overall curvature, known as the binomial distribution…"

The distribution of y/n trials is binomial.

The curves drawn are a normal/Gaussian distribution 🙂

(Which of course the binomial distribution approaches for high n as per the central limit theorem)

## Crayon Chaney

November 17, 2016 at 1:58 pm6:17 why the probability of state 1 loop back to state 1 is 0.65? I thought state 1 is 50/50 light versus black.

## Chinmay Upadhye

November 21, 2016 at 7:23 pmAwesome explanation….! Thanks for elaborating.

## Cameron Kirk

January 4, 2017 at 3:26 amWow, superb video. this channel needs more subscribers lol

## vaibhav agarwal

January 29, 2017 at 2:44 pmI like how you introduce this, philosophical.

## Xpry

January 30, 2017 at 1:21 amWhy does this guy's voice crack so much?

## jpm93

February 9, 2017 at 3:50 amGreat video, very interesting, but please lose the creepypasta music. Maybe substitute it for something?

## jpm93

February 9, 2017 at 3:53 amWhat did you use to construct the animation at 6:15 ? 🙂

## Ben S

February 21, 2017 at 4:10 pmThese videos are so great! Art indeed.

## Israr ullah

April 18, 2017 at 3:05 pmallah has created this world and mohammad(pbu) is the messanger of allah

## Dylan Barth

June 12, 2017 at 6:37 pm3.5 minutes in and you've not even touched on what a Markov Chain is. Is this allude to Plato really necessary? It seems tangential at best, and outright misrepresenting his philosophy at worst. And yes the history of how we got to modern probability theory is important, but is it worth going into to describe this one process? I'd like to know why you included so much history that really seems to divert attention away from the topic, maybe I've missed the point entirely.

## jaydev kalivarapu

August 11, 2017 at 12:06 pmLot of material in this video is taken from Khan Academy introduction video. Well explained though.

## PCL

August 15, 2017 at 2:35 pmSeriously, man! Excellent vid!

## Ujjwal Garg

September 14, 2017 at 11:46 ambeautiful.

## Martin Milter Bay

October 17, 2017 at 5:11 amGood video. Good mix of historical and mathematical information. Thanks

## Naomi Garcia

January 25, 2018 at 12:04 amThis is the best explanatory video of this sort I have seen! Excellent work and thank you for helping our community!!! I will definitely be passing this along.

## Paulo Jose Castro

March 4, 2018 at 9:07 amhistory repeats itself

## Jorge Olea

May 8, 2018 at 7:21 pmLots of statistical terminology being used

Central limit theorem

Binomial distribution

## A Beaumont

November 16, 2018 at 4:55 pmBest video ever!

## umer Arshad

December 11, 2018 at 11:00 ambest video i have seen