Information Theory part 11: Claude Shannon: A Mathematical Theory of Communication

Now, Shannon had just finished
developing his theories related to cryptography
and therefore was well aware that human
communication was a mix of randomness and
statistical dependencies. Letters in our
messages were obviously dependent on previous
letters to some extent. And in 1949, he published
a groundbreaking paper, a mathematical theory
of communication. And in it, he uses Markov
models as the basis for how we can think
about communication. And he starts with
a toy example. Imagine you encounter a
bunch of texts written in an alphabet of
A, B, and C. Perhaps you know nothing
about this language, but you notice As seem to clump
together while B and Cs do not. He then shows that you
could design a machine to generate similar looking
text using a Markov chain. And he starts off with a
zeroth-order approximation, which means we just
independently select each symbol, A, B, or C at
random, and form a sequence. However, notice
that this sequence doesn’t look like the original. He shows then you
could do a bit better with a first-order approximation
where the letters are chosen independently,
but according to the probability
of each letter in the original sequence. So this is slightly better,
as As are now more likely, but it still doesn’t
capture much structure. The next step is key. A second-order
order approximation takes into account each pair
of letters which can occur. And in this case, we
need three states. The first state
represents all pairs which began with A, the second,
all pairs that begin with B, and the third state, all
pairs that begin with C. And notice now that the A
cup as many AA pairs, which makes sense since the
conditional probability of an A after an A is higher in
our original message. Now we can generate a sequence
using this second-order model easily, as follows. We start anywhere
and pick a tile, and we write down or
output the first letter and move to the cup defined
by the second letter. Then we pick a new tile
and repeat this process indefinitely. Notice that this
sequence is starting to look very similar to
the original message, because this model is capturing
the conditional dependencies between letters. And if we want to
do even better, we could move to a third
order approximation which takes into account groups of
three letters, or trigrams. And in this case, we
would need nine states. But next, Shannon applies
this exact same logic to actual English
texts, using statistics that were known for letters,
pairs, and trigrams, et cetera. And he shows the
same progression from zeroth-order,
random letters, to first-order, second-order,
and third-order sequences. He then goes on and
tries this same thing using words instead of letters. And he writes “the resemblance
to ordinary English text increases quite
noticeably at each depth.” Indeed, these machines were
producing meaningless text, though they contained
approximately the same statistical structure
you’d see in actual English. Shannon then proceeds to
define a quantitative measure of information, as he realizes
that the amount of information in some message must be tied up
in the design of the machine, which could be used to generate
similar looking sequences. Which brings us to his
concept of entropy.


  1. Jacob Manson

    June 27, 2013 at 11:28 pm

    How do you know there are already going to be 16 videos? Are you holding out on us?!

  2. Art of the Problem

    June 27, 2013 at 11:41 pm

    Great to hear! I just I hope you don't end up dreaming in scrabble tiles as I now do 🙂

  3. Torbjorn Loken

    June 28, 2013 at 1:02 am

    I believe the description should read Claude Shannon not Clause Shannon

  4. Diego Mesa

    June 28, 2013 at 1:34 am

    fantastic! please keep it up!

  5. Romário Rios

    June 28, 2013 at 5:31 am

    I think he already has the ideas for each video, or maybe even some scripts, but not yet the videos themselves.

  6. Paul Miller

    June 28, 2013 at 7:26 am

    What we will do tomorrow is more related to what we do today than to what we did yesterday. As with transition probabilities, so too with a video series about them! Indeed you can't get very from the fact that you really don't 'know' much more than the present is telling you.

  7. MrBumbo90

    June 28, 2013 at 10:53 am

    You deserve more views amaziiing

  8. GunnarAztek

    June 28, 2013 at 1:14 pm

    If i had THIS in school I WOULD HAVE BEEN WILLING!

  9. Athanasios Kountouras

    June 28, 2013 at 4:19 pm

    Why wasn't this made a year ago? It would have helped me so much!

  10. RimstarOrg

    June 28, 2013 at 4:41 pm

    Due to the cliffhanger in part 11, I was happy that you released part 12 so soon after. But we're left hanging at the end of part 12! What was his concept of entropy? Will we die a heat death before finding out?!? 🙂
    Your videos a beautiful and informative. Keep em coming!

  11. pebre79

    June 29, 2013 at 6:45 am

    superb! excellent! outstanding videos!

  12. pebre79

    June 29, 2013 at 6:46 am

    another cliffhanger gah!!

  13. MzN SooSy

    June 30, 2013 at 10:38 am

    New Idea just born!
    Thanks man

  14. Emil Fihlman

    July 1, 2013 at 5:49 pm

    Actually I believe that there is a statistical average that is greater than zero (days before today) that determines what you'll do tomorrow because we humans strive for order and norms.

    My decision from two years ago to attend upper secondary school affects my life more than what I think today because I determined my course of daily action quite precisely back then.

  15. John A

    July 3, 2013 at 8:19 pm

    I love this series so much that i radnomly here that intro piano tune in my head!

    I seriously had a mini orgasm when I saw this video was posted. THANK YOU for creating this absolutely amazing series. Ever since finding your channel I have been absolutely in love with information theory, I even started reading Clause Shannons "A Mathematical Theory of Communication" !

  16. peterluigi

    July 6, 2013 at 9:08 am

    when is next video?

  17. Josh Tiley

    July 16, 2013 at 3:00 pm

    Where are the last four?

  18. Diogo Canina

    August 14, 2013 at 6:38 pm

    Where are the remaining four?? Please!

  19. TheAnachronist

    August 21, 2013 at 8:57 pm

    That's wonderful! I'm working on this as a small part of my Masters Thesis, and this is a great, entertaining introduction to the history behind information theory.

  20. Crux161

    September 7, 2013 at 8:59 pm

    Mmm delicious 🙂

  21. 6san6sei6

    September 11, 2013 at 11:31 pm

    man your videos are amazing. i want to put your channel in every social media there is.

  22. bassbs

    September 21, 2013 at 4:33 am

    Thanks for doing your videos! They are awesomely enjoyable!

  23. bassbs

    September 21, 2013 at 4:35 am

    and Best wishes!!

  24. Clint Miller

    October 10, 2013 at 4:51 am

    these videos are great – i am looking forward to the last four

  25. satchel frost

    October 24, 2013 at 5:36 pm

    More….. MORE I TELL YOU…. I need my fix. (p.s. congrats)

  26. عبدالعزيز الشريف

    October 30, 2013 at 10:24 am

    Where are the remaining four?? been too long Bro!!!!!!!!!!!!!!!!! we need a date ASAP

  27. Saad Saeed

    November 1, 2013 at 12:15 am

    You are simply amazing at explaining! Have a gift!!

  28. TheAnachronist

    November 15, 2013 at 7:04 am

    So excited for the next video…… I just used this video to explain how the Facebook bot works.

  29. drusamslivata

    November 17, 2013 at 1:06 am

    this man is gifted at explaining stuff

  30. J Eveson

    November 26, 2013 at 11:09 am

    Really enjoyed this series, are the next four coming any time soon?

  31. Mr Wp

    November 27, 2013 at 8:22 pm

    As always, sir – you have done brilliant job! Your videos are just a pleasure to watch!

  32. d.eileen.d

    December 2, 2013 at 4:01 pm

    geez this series is just so good… here's a little article about the amazing Claude Shannon for anyone else who can't get enough of this… 😀

  33. northern shark

    December 30, 2013 at 10:52 am

    God…now it makes sense

  34. ethan ztz

    January 12, 2016 at 7:00 pm

    3:22 "Meaningless text" haha write

  35. N F

    August 20, 2016 at 12:28 am

    Shannon world war 2

  36. Craze Maqondo

    October 21, 2016 at 12:36 pm

    can anyone demonstrate the 9 states of the 3rd order – the trigrams sequences?

  37. Naveen Mysore

    February 2, 2017 at 12:01 am

    Wow!! this channel is so beautiful..

  38. ererrs

    March 21, 2017 at 2:42 pm

    More videoes, please. Finally I understood how HMMs are used in speech recognition!
    This format was so much more fun and informative than the typical academic/formal format. I see great creative potential if good teachers like this are given access to teach us online.

  39. Josh B.

    July 2, 2017 at 6:47 am

    Hey! Great videos! Thanks for putting these up! I have one question: when you talk about the approximation being second-order, do you mean it's first-order, since an n-th order approximation from other sources like here ( show a second order approximation as depending on the past two letters, not the past one, like you show.

  40. Nathan Harvey

    August 4, 2017 at 11:11 pm

    You need to either number your videos sequentially or provide a link from each to the next (linked list) or next and previous (doubly linked list).

  41. 정민영

    January 19, 2019 at 11:24 pm

    Nice videos and explanations! I watched from the beginning and this series is definitely worth my time watching! Thank you:)

  42. Stefan Fink

    February 13, 2019 at 6:00 pm

    Improvementagement is such a brilliant word 😂 Nice work 👍

  43. Rxmbo

    May 9, 2019 at 6:26 am

    so wierd that we talk the same language but i dont understand shit your saying im so retarded

  44. Threelly AI

    May 13, 2019 at 8:50 pm

    Youtube comments are full of amusements sometimes….

Leave a Reply