Knowledge, Noise and the End of Information


– Good afternoon everybody. And welcome to the fourth in our series of 40th anniversary speakers who are helping us explore the future of information and the way that we’re
all going to be coping with the environment that
we’re seeing develop. My name is Nancy Gwinn and I’m the Director of the Smithsonian Institution Libraries and this series is being co-sponsored by Anne Van Camp, the Smithsonian Institution Archives and Spire from the Office of
the Chief Information Officer. In fact, she is the Chief
Information Officer. And we’re very pleased today to have a very distinguished speaker. This is being webcast today. Actually we’re doing
it very smoothly today thanks to our webcast and AV folks. And so at the end of the lecture, when we have questions, I would ask you to use the microphone so that we can be sure
to capture the questions and answers for the viewing audience. Well David Weinberger, our speaker today, has a PhD in Philosophy from
the University of Toronto. And he began his career
teaching the subject at New Jersey’s Stockton State College. It’s quite a leap from that
to his current position, a fellow at the Berkman
Institute on Internet and Society at the Harvard Law School. Along the way, he gained
experience in marketing, business and start up companies as well as maintaining a steady
stream of freelance writing for newspapers and magazines
such as the New York Times, the Harvard Business Review and even the Smithsonian Magazine. He writes both intellectual
and academic articles as well as humor which
ensures that the former are entertaining as well as
conveying serious messages. In 1994, David Weinberger
founded the one person strategic marketing company, Evident
Marketing and the next year joined Open Text as Vice
President of Strategic Marketing. Open Text was an early
search engine company, the first search engine
used by Yahoo in fact. And an innovator in internet
collaborative software. After helping to take
Open Text public in 1996, Dr. Weinberger returned to consulting writing and speaking, participated in founding
a couple of dot coms and served on industry and company boards. In 2000, the publisher Perseus published the national best seller
The Cluetrain Manifesto of which is he co-author. In 2007, last year, Times Books published Everything is Miscellaneous: The Power of the New Digital Disorder about the new principles
of digital organization and their affect on
knowledge and authority. And that’s where I came to know him and to appreciate his views
and wanted to bring him here as one of the stars of
our speaker’s series. And this year, last semester, he taught a course at
the Harvard Law School called the Web Difference with Professor John Palfrey. Dr. Weinberger is asking
big questions about society. For example, he has written, we’re changing the basic principles by which we organize our world. What effect will that
have on our institutions and on our way of understanding ourselves and the wold we share? In his book, Everything is Miscellaneous, he takes us on a journey
through the human constructs of classification from alphabetization through the Dewey Decimal System, all necessary but limited
approaches to organization that sometimes today
seem a bit antiquated. I’m sure his presentation
will provoke lots of questions as he talks today about knowledge, noise and the end of information. So please welcome David Weinberger. (audience clapping) – Thank you very much. Thank you for having me here. I’m honored actually. And absolutely not up to the task. So let’s begin, shall we? (sighs) It seems to me that we are. (audience laughing) It seems to me that we are emerging from the age of information
in a particular sort of way and that this is both
painful and disturbing and upsetting and throws
us into an unknown world. ‘Cause we understand
information pretty well. But it’s also quite liberating and hopeful as well. So I want to talk
particularly about the effect that this has, this transition
that we’re going through, a somewhat violent
transition in many ways, intellectually violent. And also in terms of institutions perhaps. I want to talk in particular
about the effect that has, is having on our idea of knowledge. So we have not just simply
been in the age of information, we’ve actually internalized
information to a large degree, the concepts of information, which is odd because when we started in this age, we were quite fearful of it. We resented it. We viewed computers both
as instruments of control and of conformity and of efficiency and management and our popular culture just showed up all over the place. You know, the do not
fold, spindle or mutilate sort of button that we
use to wear in the 60’s, those of us who remember the 60’s, which is perilously few of us I can see. Hmm. So computers we resisted them,
we resented them at first. And we were right to. I mean basically that notion
that computers minimized and enforced a type of conformity, it was basically right
for the 50’s and the 60’s. So that we all know the people
that we’re working with, the person in the cubicle next to us. We know all sorts of things about her. Because we interact with her
and we’re friends with her. And we also know that to the computer, especially as this age was getting going, person looked like this, they were minimized right? The information was born
out of a need to control. So you strip things down
into a handful of categories, rows and columns, view of life. Very very handy. Very important. I don’t mean to minimize that at all. But we paid a price for it, which was to strip out context. That’s where information came from. It was in fact the
stripping out of context, just as the beatnix and the non-conformist and the hippies were saying. And we were basically right about that. We of course got huge
benefit from it as well. And somehow we went from
resisting and resenting computers to taking over the concepts
of information so deeply and so profoundly that we
reunderstood who we are by virtue of them. So for example, it’s totally cradled to
grave in terms of cradle, we all in this room I suspect believe that DNA is information. And if I were to say
DNA is not information you would think that I’m
anti-science and ignorant and the rest of it. I am not anti-science. I’m not terribly ignorant. But I do think that
DNA is not information. DNA is a squiggly, physical molecule that we benefit, we can understand better by applying principles of
information science to. But DNA is not information. Yet somehow we’ve gotten so
culturated to information that that sentence, DNA is not information, sounds like it’s anti-science. That’s cradled. Grave, we have this notion that consciousness is itself information. And some extreme members of our cultures, the singularity people for example, believe that if we can model
the brain in a computer, we can take each of the
hundred billion neurons and model them into a computer then, and apply the programming
principles of the computer, then the person will
outlive his or her death. And that’s wrong. (audience laughing) But even the idea that
this makes sense to us, that we know what they’re talking about is a sign of how deeply we’ve
internalized information. So we’ve swung really wildly
over the past 40 or 50 years. How did this happen? Well, I want to go briefly
back to the origins of information, 1948, Claude Shannon, who wrote a paper that
kicked the whole thing off. It’s a difficult paper. It’s mainly math. I’m a humanities major. I can’t understand it. But he produced a diagram that has become a part of our culture. I mean this diagram makes immediate sense to just about everybody. So the effect of it
has been quite profound throughout the culture even
though it’s an odd idea. I mean the idea is that
communication consists of taking an idea, a
message and coding it, putting it though a medium, decoding it and getting
the message back out. That makes total sense
with information theory but I think it’s actually
just sort of weird. The strength of Shannon’s idea was that he abstracted
information from all media. So it didn’t matter whether it was, whether you’re talking about
Morse code or telephone signals or the internet or carrier
pigeons or whatever it is. His theory applies. It is absolutely
abstracted from the medium. And that’s where it’s strength is. Shannon takes over a couple
of ordinary language terms. It’s one of the things that’s
striking about the paper, it’s hugely technical paper. And yet he’s using language,
using words that are absolutely in common parlance. And he did this on purpose. So he talks about messages
and codes and signals. But the one that I want to
focus on briefly is noise. It’s an odd choice of a term. Noise is the enemy of information. So that’s information
going across the top there. Noise is the interruption. It is the enemy of information. Noise is to information as
mere seeming is to truth. Why noise? Why that term? Well, there are a couple of reasons. One is that he was at AT&T so line noise could actually be heard. That’s a pretty strong reason. But there’s another reason
that turns out to be really important as well. The milieu of information
theory was World War II. The people who were
involved in it, shaped it and disseminated it, many
of them came out of working in World War II on projects. And one of the projects was addressed a very discreet problem which is that battle fields are noisy. And so soldiers could not hear
and understand the commands. And so a center was set up at Harvard and the center studied this and came up with a thousand word controlled vocabulary where the words were designed to be distinguishable from one another. So this mapped perfectly to
Shannon’s theory in many ways. So, you get, this is
where you get by the way, the Able Baker, Charlie,
Fox, this is Able Company. That came out of that project. So these are distinguishable, controlled distinguishable vocabulary. There’s a wonderful book
called the Closed World by Paul Edwards about this. It’s a great book about World War II’s effect on information theory. So, you have this theory of information that maps to the battlefield condition, the condition in which you can’t hear. That’s what the theory arose from. That’s why noise, one
of the important reasons why noise is so central to this theory. Because it grew out of
this exceptional case. And so, as happens way
too often in philosophy, or history of the intellect, we take an exceptional case,
like soldiers screaming over the sounds of grinding
artillery on the battlefields. We take this exceptional
case and we treat it as if it’s especially
revelatory of the ordinary case. So we end up with a
theory of communication that’s based upon the
exceptional conditions of the battlefield. That’s one important way philosophy has traditionally has gone wrong. And I think in this case, this is yet another example of it. Exceptions are bad ways to make rules. So we have this case
where we have information going across the top and we have noise and noise
is the enemy of information. What’s missing here is, because this is a purely formal diagram, right, that’s the whole point
about information theory. It’s abstracted from all of
the vagaries of actual media. And yet, the real world,
the vagaries come back in through noise. Noise is where the world
makes an appearance in information theory, as an interruption. So we have the formal perfect realm, the sort of platonic
realm of the transmitting of information interrupted by the rude, uncontrollable unpredictable world. The thing that is not abstracted. The thing that is not formal. It’s the real thing jetting
itself into information theory. And appearing as the enemy
therefore of information. So, information theory and noise as the guise in which the
world is able to show itself in this formal theory. So now I want to take a
very big step backwards, about 2500 years and talk
about what knowledge, ’cause knowledge is
knowledge of the world. So what has knowledge looked like to us? And this is gonna be very rapid. It’s all straight forward. You know all this. In fact, I think you
know all, all of this. But this is sort of a
refresher course on knowledge in the west. And I’m only talking about the west ’cause it’s the only thing I
know anything at all about. So, we have assumed in the west that there is a single knowledge. We do not have a plural
in English for knowledge. There’s one knowledge because
it’s of a single world. And thus it is the same for everyone. It’s universally true and boy do we get into a lot of trouble with that one. Nevertheless, it’s the same for everyone. And it’s simple. Not in the sense that
string theory is so simple that we should all understand it. But in the sense that
knowledge explains the chaos and confusion of the world. The world looks like it’s
chaotic and doesn’t make sense but then we get the knowledge. We get the insight. We pierce the appearances and we see what the
simplicity underneath it. Relative simplicity underneath it. We have assumed that knowledge is settled. That once we know something,
we know something. We move on, we don’t keep
on knowing the same thing. We know it, we move on. We write it down so
other people can read it. And we move on. We’ve assumed that knowledge
is not just a bunch of statements thrown together
but there’s an order. It’s systematic. It’s orderly. Very fond of this type
of order in the west. But in any case, orderly. And we’ve assumed that
knowledge is scarce, relatively scarce. Most of what we believe is not true. Most of it is mere opinion. And only some of those opinions are true. And it’s good to be able
to tell which are which, nevertheless, knowledge is rare. That is relatively scarce. So, given these assumptions
about knowledge, we’ve come up with a set
of tools and institutions for dealing with them. All right, this is one for example. Perhaps, it’s familiar to some of you. Where you get thousands
of artifacts in a day and you need to categorize
them and classify them. And you have a large staff of experts who are able to do this. And this is a very, I’m pandering now, but I say this everywhere
I ever use this slide. It’s a very difficult skill. It takes a lot of
training to do this well. But we know how to do it. And roughly, we know how to do this, how to take what we know
and make sense of it. And order it and organize it. Likewise, in the news business. We know how to do it. It’s really hard. It takes a lot of training and dedication. But we know how to do it. We’ve gotten good at it. The only issue is, it doesn’t scale. It doesn’t scale real well. Because as you’re trying
to control and organize and classify and manage, that works for the amount of
scarce knowledge that we have but it doesn’t particularly scale well. Because control in general
doesn’t scale well. You can if you really try. You can scale even governments
to form totalitarian states. But it takes a lot of work. It’s really really hard to do. And they tend not to last
all that long, thankfully. Control doesn’t scale easily anyway. So, this hasn’t been a
terrible problem for us when it comes to organizing knowledge. Because our knowledge has
been gated and kept scarce by the physical means that we’ve used to preserve it and to communicate it. We’ve been using books, primarily, we’ve been writing things down. And books are physical objects. They’re hard to produce. Very few people who want to
be authors become authors. It’s a one to many relationship and the filter is pretty small. And so that has been a natural, literally a natural limitation that has enabled us to build systems that work at that scale. At the scale that atoms allow. It’s not a coincidence that
many of the salient properties of knowledge are also properties of books. Scarce, orderly, settled,
the same for everyone. Those are properties of printing presses. Not natural properties of knowledge. The fact that the two have
the same characteristics is due to the fact that, it’s not due to the fact that
this is what knowledge is and wants to be and should be, rather it’s due to the fact
that our physical means of communicating and preserving knowledge have been so limited. But now we’re entering, we’re digitizing everything. And this audience knows
I’m exaggerating wildly. But for many people, the vast majority of what they deal with when it comes to
information will be digital. Even if that’s a tiny percentage
of the world’s resources. Nevertheless, we’re
digitizing as much stuff as quickly as we can and increasingly important to all of us. So that means we can now
see what noise looks like. This is the internet. The internet has particular properties that you’re all familiar with. It’s open, it’s non-systematic. It’s no permission is required. It was built without any managers. There’s nobody with a business card that says, manager of the world wide web. Had there been the thing would have never gotten as big as it did. It was only by removing
permissions and removing management that the web was able to scale. And so we have this
incredibly messy disorganized chaotic fullness of stuff that manifested every turn. At the nodes and within the nodes, this difference. Not homogeneity, not universality, but difference. People disagreeing with one another. Every point of view. Every wrong fact, every right fact. Every rumor. Every artwork, every piece of porn. It’s all there. And it disagrees with itself. Otherwise, the web would
be very uninteresting and not at all useful. It’s the fact of the
difference that makes the web so appealing and so useful. That the pieces are not
all saying the same thing or all from the same point of view. Which is simply to say that the web, the internet is incredibly noisy. It is unbelievably noisy. And that’s it strength. So the Sarah Palin page at Wikipedia is a very good page. It’s actually a really
good source of information about Sarah Palin. But, the talk page, the discussion
page is way better. (audience laughing) This is just a list of the topics. The discussion page is incredible. It is maybe the single best place to go to learn about her. Because you get not only
the agreed upon language and the work out into neutral language but you get the reasons behind. You get the differences. You get the anger, you get the annoyance. You get people trying
to find common ground. ‘Cause one of the
remarkable things about most Wikipedia talk pages is that
people who violently disagree agree about their premise
of the conversation. That they need to try to work
out something, an agreement. So you see the disagreement here. You see the difference. And the sociological and historical value of a page like this, 50 years from now, is gonna be incredible. You want to understand
how people were able to, what people were thinking when the 45th president
of the United States, Sarah Palin was elected? (audience laughing) The 46th? Whoever is after McCain. You’ll go to this page and you’ll say, oh. Man, she burst out of the scene. This is what people were arguing about. These were the issues. It’ll be an amazing artifact. So why don’t all encyclopedia’s
show us their talk pages? Well, one can snarkily and
when I say one, I mean I, ’cause I’ve frequently have
been snarky about this. I can snarkily say, well they want to have the
appearance of authority and they don’t like, and there’s some truth to that. But it’s certainly not what
the encyclopedia’s say. They’re saying something very different. Which if they look at the
correspondence, the emails, back and forth between their
authors and their editors. And they say, well this is
where they’re working out bad ideas, right. This is how they wrote the
great article that we have. But the back and forth, that’s noise. That’s a distraction. That’s noise. The noise that Wikipedia
shows us is the most valuable thing that Wikipedia shows us. I would absolutely
maintain the talk pages, are almost always higher value
than the page themselves. At Wikipedia, the noise is the value. For traditional publishing, nope. Those discussions look like mere noise. But it’s not mere difference. It’s not, what you get on the web is not just a bunch of people who, each of whom are saying
something different like a hundred million. A hundred million? Let’s say a billion different
editions of the newspaper. Rather, this is a linked environment. So you get these talks to some degree, these points of view. These hundred million points
of light or whatever it is. You get them connected and
contextualizing each other. Which means that we are
in a position maybe now to step one step beyond in Hegelian sense. The old contradiction between
objectivity and subjectivity. Where objectivity has the
weight of showing us the world as it would look if
nobody were looking at it. And subjectivity has the benefit
of showing us the interior of how people are thinking about it. But you have all these
different subjectivities and doesn’t you know, it’s just one person
against another person. Well, in a linked
environment, in an internet, you don’t just have
individual subjectivities. What you have is individual subjectivities and various forms of conversation. Some low value, some high
value, some academic. Some crazy and flaming
and hateful and racist. Nevertheless, you have the subjectivities in conversation with one another. And that in some ways, in some ways gives some
of the heft of objectivity to subjectivity, to link to subjectivity. So, as we move online, the principles of how we,
of knowledge I think change. And I’m only gonna talk a little bit about a couple of these. The main one is this. So in the early 1990’s, late 1980’s, there were these books written
about how we were going to be overwhelmed by information. We were going to drown in it. And we’re doomed and there’s no stopping. And it turns out of course,
that the amount of information is orders magnitude larger
than anybody foresaw. But we’re really doing pretty well. I’m not overwhelmed in information. I get more spams than I want and so forth. But you know, not at all. I feel better able to find
stuff than ever before. They were absolutely wrong. Those predictions were
absolutely wrong even though there’s way more information
than anybody said. And for a simple reason, we succeeded. ‘Cause it turns out the solution
to the information overload problem is to generate more information. As long as it’s metadata. ‘Cause metadata obviously
let’s us manage the information that we have and we’ve gotten really good at generating metadata. Not good enough. But lots. Generating lots and lots of it. And the basic nature of metadata
now I think is changing. So that, we’re very use to the idea that there’s a difference between the thing and the metadata describing it. The label and the person or the folder, the label on the folder and
it’s contents and so forth. We’ve taken that for granted. And we’ve built complex and extremely useful metadata systems. We’d be lost without them. That depend upon that clear distinction. Well so you go online and let’s say you have a little piece of metadata which is the name of the author of a book and you want to remember
what the book was. So you type into your
favorite search engine, herman melville, you get
back the title of the book. Except because everything is online, you also get back now
the content of the book. Especially if it’s not copyrighted. Carefully chosen example. So you get back the content as well. So let’s say now you
have a different issue. You remember a little bit of the content. Let’s say it’s the first line. So you put that into the search engine and now you will get back
the name of the author. So the content has become metadata. But you won’t just get back
the name of the author, you’ll get back everything the
web knows about the author. To the millionth degree,
you’ll get back the author, the bibliography, the
biology and ecology of whale, you get back a map to the
author’s house in Pittsfield. You’ll get back his social network. Nathaniel Hawthorne and
the time they went up on a Monument Mountain and had the picnic. And Monument Mountain and the bird. You’ll get back everything! You’ll get back Al Gore
within two steps of this. (audience laughing) You will get back everything. So it turns out that the old distinction between metadata and
data simply doesn’t hold. That metadata is simply
the thing that you know and data is the thing
that you’re looking for. And that changes obviously
depending on you know. So, this is fantastic news. It’s merely an operational difference. But it’s a hugely important
and I think wonderful news for us as a species. Because we use metadata
to pry up knowledge. Here’s what I know. Oh, here’s what I didn’t know. I got it. And if everything now is metadata, all content is metadata, then we just got way way
way smarter as a species. It also means that drawing
distinctions between content and the metadata and the links, nah. It’s all connection. It all should be connection. As far as we can, we’re gonna make everything
into a connection. And we’ll be smarter for it. Because we’re dealing with metadata here, one of the old rules, another one of the ole rules
falls by the waste side. One of the assumptions
in the physical world is the people who own the stuff also own the organization of the stuff. If you own the auditorium, you own how the chairs are organized. If you own the clothing store, you own how that’s set out. If somebody goes through
with a shopping cart, making a huge pile of
everything that fits her and ignoring everything else, which is completely the rational way of proceeding by the way,
you’ll throw that person out. ‘Cause she does not have
the right to reorganize your stuff but online? Absolutely. The people will reorganize. And if you don’t give them the tools to, they’ll either stop coming or they’ll go to another
site and use the metadata to point at what you have. Because pointing is free. Metadata is free. We don’t need your permission. And we’ll organize it that way. We’ll do every possible
type of organization, multiple layers of organization, personal organization, we’ll aggregate the personal organization, well get social organization. We’ll do everything we can. Because you who own the content do not own the ordering of it anymore. Which means I think, a really
basic principle changes. Which is a basic strategy
for dealing with this stuff which was minimized. Get experts, exclude from the
collection as much as you can to make it, anything that’s worthless, just make the collection higher value and then organize it in
ways that will make sense. Anticipate your users needs
which is what we’ve had to do, it’s what absolutely what you have to do with the physical collection for example. It makes sense that it’s, I know it’s very hard, we do it very well. But digitally, that doesn’t
make nearly as much sense as saying, well no,
let’s include everything. It’s more expensive usually these days to throw thins out
digitally because you have to inspect them first and decide than it is just to include them. So include everything. Which means that the stuff that I don’t mean you. I’ll say one. The stuff that one might be excluding because it seems trashy. Because it’s about Paris Hilton or Brittany or Lindsey or whoever. You know, it’s just sort of trash news. It’s not real news. And so you exclude it. That was right when you have
limited amounts of space and it was difficult to retrieve things. But in the digital age,
no, include it all. Because somewhere in fact, it’s East Anglia, it’s gonna be a set of
academics who are studying the effect of celebrity culture on women in the early 2000’s and by excluding all
those trashy articles, you just destroyed their research. We had to try and anticipate
what people were interested in and how they wanted to
search because we had to. Somebody had to make those decisions. But now we don’t. And we never could. We never really could
anticipate what people are going to be interested in. An example came up at lunch. I’m sorry, I don’t mean
to pick on Sarah Palin. But I’m about to a little
bit, just as an example. What’s the name of the
town she was mayor of? (audience murmurs) Wasilla, isn’t it amazing
we all know that now? (audience laughing)
Except me. So, wouldn’t it be interesting to be able to go back to the town meetings at Wasilla especially the ones where
the librarian was in in order to figure out what
was truthful and what’s not. And all the rumors running
around about censorship and so forth. That turns out to be really interesting. But if I asked you last week, okay we need an archive. How important are the town
meetings with the librarian of the 6000 person
town, Wasilla in Alaska, you would put them off with
Brittany and the rest of them. Aw yes, it’s interesting,
we don’t have room for it. You cannot anticipate what people are gonna be interested in. We use to have to, now we don’t. In fact, we don’t even have to anticipate how they’re gonna want
to search for things. We can postpone until the
moment the user comes along deciding how this stuff
is gonna be organized and be made searchable. And you all know the tools and techniques that are being developed to do this. But it’s an incredibly fruitful
period in human history that we’re able to now,
over the course of 10, 15 or now years going into the future, have this compressed
period of which we invent new principles and techniques
of organization of ideas, information. We’ve never had a period like this. We’ve never had an opportunity like this to come up with new ways of doing it. You know, how many
thousands of years it took to get to alphabetization? This is an incredible period. So, I’m sorry, there’s the postpone. We can postpone until
the moment the user asks, how we’re going to organize
the material for her. So I want to give an example. And this will seem perhaps
like pandering as well. But I thought about taking this out. But I decided to leave it in. Because I’ve been using these slides and I thought at the very least, I’ll tell you what I’ve
been saying about them. ‘Cause this has become my standard example of a good way of doing this stuff. Have I pandered you enough? Anybody here who’s been
involved on this project? Thank you. (audience member murmurs) I think it’s fantastic. I love this example. And people, I’ll tell you to step out of
lecture mode for a moment. People get this immediately. Everybody responds to
it really positively. It’s just a wonderful wonderful example. It’s also a really good site, so. Okay, so. Is everybody really familiar with this? (audience murmurs) Some no’s? Okay, so Library of
Congress, bless its soul, so you can correct me
when I get this wrong. Damn it. Found 3000 photographs, color photographs from World War II. Gorgeous, gorgeous photographs. It needed help. Filling out everything we know about it. So they put them up to Flickr. You know, the Yahoo photo site. And let the public use
the facilities at Flickr. So the Library of Congress
put up what it knew about it which is some, not a tremendous amount. But knows some about these photographs. And then said to the public, go ahead, use the tagging
for example at Flickr. So you get a set of tags,
some of which you would expect from a professional cataloger at the Library of Congress for example. But many of which were
either idiosyncratic or crazy or you can’t figure out why. But some were actually
really really useful. So for example, Rosie the Riveter a professional cataloger
might not have tagged it because this is not Rosie the Riveter. Nonetheless, if you were
interested in Rosie the Riveter or interested in the
role of women in industry in World War II, this might be a really
interesting photograph. It makes sense. Coif, C-O-I-F, coif. Sorry, it sounds, it’s
an odd word, isn’t it? And it’s not a word that you
use tremendously these days. But she’s got this
gorgeous 1940 hairstyle. If you’re interested
again, good word to use. Red, because of the
little bit of dramatic red on the probe and on her lips. So you get these interesting tags that a professional probably
would not have come up with. Flickr also allows you to annotate. You draw a box and you attach
text to the annotation. So the circle there, those set of nested boxes, that’s an argument. The first box says, oh yeah, this photo was totally posed because the bib of her ball
point pen isn’t even extended. Which is a really profoundly
stupid comment by the way. And the next person says, no, it’s actually, it’s
not a ball point pen. The next person nested inside says, and it is extended. If you zoom in, you can
see the thing is extended. So here, we will use whatever facility
is given to us to argue. We are an insanely social species. We will socialize
everything we possibly can. And of course, Flickr also
let’s you add comments. So one of the very first sets
of comments that get added? Flickr, it turns out, only allows 75 tags. Because Flickr was thinking, people are gonna post photos
of their uncle on the beach and you’d need the tags,
uncle, beach and sunburn and you’d be done. There were not thinking of historic photos that maybe you’d need more than 75. So the 75 filled up in a matter of days. And people started to hacking Flickr in the good sense of hacking using the comments to insert tags. If there had been room for more tags, I would have tagged it whatever. Now, perfectly good use of it. But people also used
comments to make more normal sorts of comments about
the sociology at the time, the role of women. A bunch of questions
and comments about the actual technique, the
photographic technique which somebody at the Library of Congress, maybe somebody here, answers, very straightforward, friendly way. That person gets a
metadata over on the left, that image, saying this is authenticated. You can believe this. And plus puts in a link
to more information ’cause we always want more
information on the web. And because it’s the web, within days, we get a conspiracy theory. (audience laughing) Can it be simply a coincidence? That during this presidential campaign, a bunch of patriotic
photos have been released. Clearly, this is a Republican ploy. (audience laughing) And then because it’s a web, somebody writes in and says,
you’re an idiot and moves on. (audience laughing) So I actually think
it’s a wonderful example of a whole bunch of things
about what we’re doing which is we’re coming
up with new techniques for doing this, such as
tagging and annotating photos. And we’re doing it in public. Socially, we’re talking a
different tone of voice. The Library of Congress
sounds like a human being? Oh my God! How’d that happen? It happened because of the web where human beings can actually talk. And we can see the humanity
behind the beautiful facade of the building. So I want to talk about
some of the consequences of this for knowledge. I’m sorry, did I now offend people? Have I undone all the pandering I’ve done? (audience laughing) Okay, you see me afterwards. And I’ll apologize. So, we get to see what happens. Because of this digitizing of so much of the information and the
metadata about the information all in one huge miscellaneous
pile if you will. We get to see what happens to knowledge when control goes down
and the amount of metadata and the nature of metadata goes up. And we have an infinite amount of space, not bound by four walls and an infinite amount of connection. You put those things together, these are things we get for free with the digital revolution, the networked revolution. And we get to see what
happens to knowledge. In some senses, it’s knowledge
taking a more natural shape unbound by the limitations
of the physical. So I’m gonna go as quickly as I can through a bunch of these. First knowledge turns into an ecosystem. So that we have nature
continued, Nature Magazine, referee journals, high prestige, they continue doing their
work, it’s good work. We need them. But we also get variations like PLOS ONE which is refereed but
will publish anything even if it’s not important. Even if it’s not nature, it
has to be an important article. Well, there’s lots of
good science going on that doesn’t count as important. PLOS will publish it. Still refereed. So this is archived. Well this is the un-refereed place where anybody, any
scientist with any research of any standing can publish her work before it’s been sent or approved by a traditional publisher. So way more stuff but it’s
of more uncertain quality. But that’s okay because we know archives are very clear about what it’s doing. It’s not pretending this is refereed. It tells you exactly
the nature of the stuff. This is just stuff that
somebody submitted. The person has to have
a minimal credential. But beyond that, we don’t know. The metadata, well authority is always been metadata. And so the metadata
for a site like archive is clear and we understand the authority. And then all the way down the chain to where you get sort of the jokey, the half baker, for ideas that are clearly maybe not great ideas. But that’s fine too. It’s funny and the
metadata makes it clear. The lack of control manifests
itself as complexity. So, President Bush two years ago, two and a half years ago I guess now, gave a talk on immigration. And you know that he did
what every president does, every speech and writer and
presidential advisor says which is keep it simple. Exactly the same mantra as
every broadcaster just about because it’s the same thing. It’s one to many and so you
want to reach the most people so in order to do that, you file off the rough
edges of intelligence and you keep it simple. Every president. So he gave his talk. 2400 words on an immensely complex topic. Oh my God, it’s immigration. And within a couple of hours, there were 2400 blog posts. And you don’t have to have read them all to know what they’re doing, right? It’s what bloggers do. You know, you look at something. And you hold it up like this. And you turn it over. And you look at the under. and say, oh did you notice that, this is different from
what Bush was saying when he was governor? Oh, you notice this Republican over here, Schwarzenegger disagrees or agrees or I wonder how this affects. Will this be, I wonder how that. You take something simple and you find something
more complex about it and you call it out. And that’s what bloggers do. Because that’s what we do when we talk. This is how we have conversations. We take something simple
that we all agree on. And then we start, yeah,
no, that was a good movie but I wasn’t sure that, I’m
not sure Heath Ledger really, I didn’t understand why. That’s what we do. Now we have moments of
simplification also. And if you teach, you know, you see this all the time where or if you’re a student for that matter. Where a teacher puts something simply, you understand it. And as soon as the class understands it, she says, yeah, it’s not that simple. And then you start teasing it out. It’s a dialectic. We do this all the time. The problem is that we’ve had a 100 years of a broadcast regime which
has an economic interest in being simple. We’ve had a 100 years of that. And one of the ways you can understand some of the enthusiasm for blogging. This sort of irrational
enthusiasm for blogging which I completely share. You can understand some
of this as a reaction to a 100 years of being
treated like an idiot. And now get to talk and
we get to be complex. Aw, you noticed. When you have this sort of
complexity without control, you get messiness. So if this is what we look
like in the age of information, this is what we look at in the new age. There’s lots of information here. That information doesn’t go away. There’s tons of information. There’s more information that
was ever put into a database. Lord knows way more than
in any corporate database that exists. We wouldn’t even allow this
stuff into a corporate database. Tons and tons of information. But it’s also all linked. And the links contextualize. Every time you link something, you’re adding, just as when
you categorize something, you’re adding to the information. You’re saying, when you
categorize, you say, oh this is a mammal, good. You just added information to this thing. When you add links, you’re also adding information. But it’s messy. It doesn’t go together well. It’s inconsistent within one
site and the cross sites. Okay. But it’s rich. It’s rich with ideas,
information, context, implications, the explicit, the implicit. That’s what we’ve been
doing for the past 15 years. We thought that knowledge is settled. And one of the things
that the web has done, the internet has done
for better and for worse is to unsettle knowledge. Knowledge use to. (audience laughing) Follow the path of publishing. That is you, when you publish, you work on something in private. And then when it’s done, you publish it. And once it’s published, it’s done. And you have a strong
interest in defending it. And you do the revision,
the errata, the correction if you have to but it’s embarrassing. It’s done. And once it’s done, it’s done. Well things on the web are never done. They’re never done. Wikipedia is the greatest
example of it ever. But it’s not certainly not just Wikipedia. It’s everything you post
that has comments on it. And you’re willingness to
go back and to fix it up and to change it. To send it around and redo it. Nothing is ever done on the web which is both good and bad. The good part is that in proper circumstances, it just gets better and better. It gets richer and more reflective and the bugs get driven out of it. You know, that’s why things on the web have version numbers ’cause
they’re never ever done. But it turns out that doneness is not a property of knowledge, it’s a property of publishers. It’s a property of printing presses. This stuff is very loosely ordered. We’ve had to divide things into topics because that’s how books work. You know, you’ve only got that much space. So you better define a topic
and a topic is something that fits within the
300 pages of the book. But that means severing information. That means remove just as
adding things to categories or adding tags, adds information to them. Severing things of topics also removes information from them. So for example, at Britannica, the philosophy topic in the macropedia is a 180,000 words or so. Which is like three good sized volumes. That’s a lot of philosophy. In Wikipedia, it’s a mere 9000 words. And 9000 words is actually a
very long Wikipedia article. That’s longer than the recommended length of a Wikipedia article. Nevertheless, it’s one twentieth. So how do we decide who has a better or more complete philosophy section? From this, it certainly
looks like Britannica but you know that those 9000
words are intensely linked. Every other word in the
article is linked just about. And those links spread out all over. They don’t just go to philosophy. So there’s literally
no way of figuring out how much philosophy Wikipedia discusses. And that’s not just because
we don’t have the patience or the grad students to be
able to click on all the links so that two million articles in English, that does matter. It’s that even if we did, we wouldn’t know where to stop. We wouldn’t know exactly if this article, does this article count as philosophy? I’m not sure, it sort of does. It’s about the economic
conditions in Germany when Hegel was writing. Is that philosophy? I don’t know, I’m not sure, maybe. This one’s about music. Well there’s some philosophers
who think that music. We can’t know how many topics Wikipedia touched on philosophy
because we don’t know what philosophy is. It’s not a bounded topic despite the fact that there’s a beginning
and an end in Britannica to the topic. So in some ways, I would say Wikipedia is
a far better reflection of the richness of ideas. And the limitations that
the topics impose on us. The severing of information
that topics causes. When we have all of this inter networking, we have so much stuff and so much metadata by which we can recommend
stuff to one another and pull things together, the only possible way of succeeding, of making our way through this. And this was not foreseen
by the early information dooms theory people. But we’ll do it together,
we’ll do it socially. And we do this all the time. We do it, when we make lists. We do it when we dig in other sites, recommendation sites like that. We do it in blog posts where we recommend, we link and we say, here’s something you should read. Go away from my site. Go read something else. This little act of selflessness. We do it all the time. We’re together recommending
and explaining why. We do it with mailing lists Mailing lists are a phenomenal
example of social knowing. Where somebody posts something and somebody else comments. And even if the person who
wrote the first article is a wonderful expert on the topic, the fact that people are responding to her and asking questions and taking issue and she’s responding means literally the mailing lists is smarter than the smartest person on it. Which also means by the way, that one of our old ideas about knowledge that knowledge is in our
heads can’t be right. ‘Cause the knowledge in this mailing list is exactly between everybody. These are four quick examples. But you know that these are the nearest, meagerest set of examples. The web, the internet
overall is full of new ways to make one another smarter. To know more together. It’s maybe the single most exciting thing that we’ve been doing on the internet and I would say culturally since the mid 90’s. So if you open up Britannica at random and I pick on them because
they are conventionally, traditionally taking to be the greatest English
language encyclopedia, if you open up at random, you are right to believe what’s in there simply by being included
something has credibility. Where as if you were simply to
open up Wikipedia at random, the fact that something is in Wikipedia doesn’t by itself confer credibility because you might hit at
the moment that some jerk went through and you know,
trashed the whole thing. So that’s not where Wikipedia
gets it credibility. Now I think Wikipedia generally
has a lot of credibility. So, my question is where does it get its credibility? We know Britannica get its from a chain of credentialed experts and processes. And that works really well. It doesn’t scale all that well. But it works really well for 65000 articles in the Britannica. In Wikipedia, they don’t have a chain of credentialed experts. You are actually now allowed
to mention your credentials as an argument in favor of
what you are arguing for. So, it’s very much not
a credentialed chain. So where does it get its creditability? And it’s from a few places. It’s from people’s
familiarity with the topic and with the talk pages. But I want to point to one
particular thing especially which is that Wikipedia encourages us to stick in these labels that says this article is not up to snuff. Not good enough. There are over a 100 of these. They read like a list of the ways an encyclopedia can go wrong. And if you come up with a new one, you need a new one? You go ahead and make it. It’s Wikipedia, right? Put it in the library. Somebody else will use it. So a long list of these. It includes, it doesn’t
meet quality standards, it appears to contradict another article. It appears to contract itself. (audience laughing) Not worthy of an encyclopedia. Needs to cite more sources. Reads like an advertisement. Reads like a sermon. And then there’s the really important one, whish is neutrality has been disputed. Because as you know,
Wikipedia is all about trying to come up with neutral articles expressed in neutral language. So it seems clear to me
that Wikipedia’s willingness to put in notices of its own weakness, of its own fallibility,
makes it more credible. That these notices tell us immediately instantly, that Wikipedia is on our side. It’s not trying to be more than it is. It’s not trying to be
the infallible expert. It just wants us to
understand our world better. And if an article is not doing the job, then that’s really important metadata and it’s totally open, encourages you to stick these up there. ‘Cause that’s how Wikipedia gets better. And it’s how we understand better what the individual articles are saying. Wikipedia’s willingness to acknowledge its own human fallibility
makes it more credible. And the question is why
you will never see this here. (audience laughing) And in part, it’s because there’s an economic
and cultural investment in appearing authoritative. But it’s rejection of the most simple fact about humans, which is that we’re fallible. We make mistakes. This is from the beginning,
how we differentiated ourselves from the divine. We go wrong all the time. All the time. We don’t get anything right. And when we do, we don’t agree about what we’ve gotten right. We are the fallible ones. And the unwillingness of institutions, whether it’s The Times
or some other newspaper or an encyclopedia or a textbook or a teacher or a politician. Or a museum, that the insistence, I’m speaking in this tone of voice. We’re the New York Times
and we’re here to tell you. That increasingly is going to alienate us. It increasingly is going to drive us away. We’re gonna think this, these people are uncomfortable
with being human. And we don’t like people like that. Right now, it’s a really
big digital world. We don’t have to deal with them. And so we won’t. So, remove control, you
increase the amount of metadata. You provide infinite space
and infinite connection. And you end up with knowledge taking on these characteristics. It’s unsettled, it’s complex. It’s messy and so forth. But these are also
characteristics of being human. And these are also
characteristics of the net. What characterizes the
net is that it’s noisy, it’s messy, it contains difference. It’s imperfect. It’s always more wrong that it is right. It’s complex, it’s poorly
or it’s disorganized. It’s non-organized. It’s fallible. These are characteristics of the network. There are also characteristics
of being human. In a sense, it completely
should not be surprising. This is basic MacCluen. The medium of knowledge has been books. Now the medium of knowledge
increasingly is the internet. So it takes on the
characteristics of the internet. The things that’s of it’s medium. The thing that’s interesting to me is that those characteristics
are so profoundly human. So I want to end with just a quick nod to a couple of problems. A couple of challenges. Opportunities. Nah, just problems. So. And I will be very brief. The first is that we have to
get really good at metadata. There’s so much riding on
getting metadata right. Because in a digital world, there’s no difference between the content and the metadata to begin with. It’s all mixed up, all
content is connection. And because the only way
of managing in the world this gigantic flood of
information is through metadata. So we have to get better and
better and better at metadata. We’ve been getting
really really good at it. But we’ve got to get even better at it. And the issue is that we don’t know, we don’t agree. It’s not settled what
it means to get better at metadata. We have one, there are
many many sides here. But we have one set of people for example who think the better at metadata means, more rigorous standards,
more controlled vocabularies. A semantic web that, in,
to one degree or another, does a knowledge representation
that’s well ordered not hierarchical perhaps but well ordered. To some people, that’s
what it means to get better at metadata and to other people it means having better translation systems between the metadata systems. And for other people it means, throwing open the flood gates of metadata, allowing everybody to tag and add metadata of any sort and then figure out later the best as you can how
it all goes together. And for somebody else, it means picking up the implicit metadata. We don’t agree about what it means to get better at metadata. But that’s okay because
we need to get better in all of those ways simultaneously. We need to get much
much better at metadata. We’re doing really well. We’re doing incredibly well. And this task is not going to end. This is going to be a perpetual taskforce for our culture and all cultures. Second issue is, there are those of us
who are web optimists, web utopians, that’s me for example, who look out at the web and we see the web is where we join
together in diversity. And love difference and
enjoy embracing all of that, that’s one view. And the other view says, no. Take a look around, wake
up, smell the coffee. Web is a set of bunkers, eco chambers, where people talk only with
people who are like them. So this is a hugely difficult question. I’m not going to address it at the moment. As I say, I tend toward the first, but I certainly recognize the seriousness of the second position. I think that’s true also,
we do both of those things. The conclusion that I’m going to draw, rather than trying to
resolve this question. ‘Cause it actually, it can’t be resolved. It’s something going on
underneath the question that’s making the question unresolvable. The conclusion I draw therefore is, as a web utopian and web optimist, good but, we really have to work on diversity. The encountering of difference
is not going to happen enough just by itself. The existence of the web or the internet is not going to cause world peace. It’s not going to cause people
to appreciate difference. This is something that we
continually will have to work on just as we work on metadata. We don’t naturally get interested in that which is radically unlike us. And so we need to do
that which is unnatural over and over and over again. And we’re going to fail at it. We’re never going to achieve the sort of peace indifference, peace in difference, not peace indifference, sorry. We’re never going to achieve
that sort of peace naturally. It’s going to require massive, continual heartbreaking effort. But that’s okay. Because we’re in the age of abundance. We have this opportunity
now, we didn’t have. So even if we’re in the worst of bunkers and that view is almost entirely right, we have an opportunity we’ve never had. Because we have such an
abundance of content, but more important an
abundance of connection. We also have an abundance of crap. (audience laughing) The internet is more of everything. It’s more of the good
and it’s more of the bad. It’s more spam and porn and trash of every and hate of every sort
but it’s also more art and love and genuine
friendship and caring. It’s all of those things. It’s more of everything which is the opportunity that we have. Now it turns out we are far better at dealing with the abundance of crap. We know how to do that. We’ve been dealing with
an abundance of crap for a long time and on the net, we’re pretty good at it. So there’s way too much
spam but most of us still manage to get our email done. We’ve worked out
solutions, better or worse, to work on spam and we have
the family, friendly filters in place for our kids and so forth. We know how to deal with
the problem of too much bad. It’s the problem of too much good that’s really driving us crazy. That’s what’s terrifying to us. There’s so much good. If we don’t know how to
sort it, organize it, find it, recommend it, how to evaluate it. How to evaluate it against other. The enormous amount of good is
the real challenge facing us. It’s not such a bad challenge to have as we exit the age of mere information with pain but also with some joy. Thank you. (audience clapping) So we’re gonna have
questions and talk I believe. – Okay, so any questions? Please come to the microphone. Yes. – [Man] Hi. Some of us have been
doing a thought experiment where we imagine that our job is to put the Smithsonian out of business. If we were a bunch of young Titans or wanting to create a
collecting or institution now, what might we do to make
our current institution essentially irrelevant? – Thank you for that question. (audience laughing) Oh I guess I have to be behind
here for the microphone. Well this is the healthy exercise
that businesses go through usually on off sites and
then they go drinking. So. (audience laughing) You should keep that in mind. They almost always end
in group drunkenness. So you have an enormous
amount of physical stuff that’s really valuable, important, and you need to maintain. So I’m gonna leave that aside. I’m taking that for granted. Correct?
– Correct. – Okay, ’cause you’re not proposing the digitizing and bonfire afterwards. (audience laughing) – [Man] There’s a
coordination of collections, stewardship, that we
are deeply in love with. – So. I’m sorry. (person clapping) (audience laughing) So I don’t think this will be helpful. One of the things to do, I’m involved in a, I’m
gonna leave that aside. Okay. So one of the things
to do is to think about just state the obvious, is to whenever possible,
to aggregate the metadata rather than you know, be a collection of pointers
rather than of stuff. Especially when you’re
talking about digital stuff. There’s frequently, no
reason to own the stuff and probably no possibility of owning it. And that trying to own
it is going to get you into scaling issues anyway because it’s gonna take much longer. Where as pointing, the
pointers are going to break, of course, they are. But you know, 404 is
a fact of life online. Aggregating as much stuff as you can simply as metadata moves you zippily along and covers some part of your mandate. I don’t know. That’s not gonna put you out of business. But there is a continuing
business that you need to be in. And part of your continuing business is also going to involve physical objects. So, that part of it, I don’t see how you
get put out of business when you’re dealing with the physical. I have a feeling I’m addressing
this particularly miserably. Would you like to expand
on my inadequacies? – [Man] I think one of the
questions we wrestle with is how to establish
reputation and trust now. A long time ago, it was enough
to own a 100 million objects. – 143 isn’t it? – [Man] 137. And I’ve just read online
that we’re going to digitize all of them. (audience laughing) And I think your comment
about pointing is very apt and also about collection stewardship. We own 137 million but
there are countless more that we don’t and never
will that are germane. – Oh, I have nothing else. (audience laughing) I’m sorry. – [Woman] Lady in the back. – Oh so I will say, I’m sorry. In terms of authority, the institution has a
huge amount of authority. So I really like the fact that Library of Congress puts its
little badge up at Flickr. That’s really important metadata. I want to know that it’s
the Library of Congress who’s telling me how the
photos were digitized rather than, or taken,
rather than just somebody. So I wouldn’t be shy about establishing and using
marks of authentication. Authority is metadata. And you’re in a position to use it, where you actually have the authority. I mean it’s, why not. – [Woman] I’m sure you’ve been asked this. But how come I sort of
sense with everybody can tag and everybody can do everything, that it’s somewhat anti-intellectual? – Because most people are not intellectual and resent people who are. – [Woman] Thank you. (audience laughing) – Have you not been paying
attention to politics in the past 50 years? (audience laughing) So this is a clearly an issue. The good thing about tagging is that, a folksonomy is not simply
a replacement taxonomy where you take the work
of professional taxonomers and you say, nah. We’re gonna go with what
the sweaty crowd thinks and replace it with a single folksonomy. The good thing about folksonomy is that there are multiple ways through them. And so you may discover
that people are thinking, they’re tagging, I’m sorry
to take a political example but that’s all I think about these days. So, you may discover that people
are tagging Obama, Muslim. Which by the way, he’s not. So you may not want to make
that the head of your taxonomy because you know that it’s wrong. And yet, in a folksonomy, I’ll make it up, let’s say 70% of the tags are Muslim and 20% are Christian and 5% are whatever. You can use all of that information to help people find information. It’s not winner takes all. Even a single tag that
somebody’s applied may be useful. Maybe only person tagged that photo coif or Rosie the Riveter. Nevertheless, that will show
up somewhere in a search and be a useful way in. I’m not suggesting and
I don’t know anybody who is suggesting that
we throw out the work of trained professionals and replace them with the anti-intellectual
and sometimes quite ignorant and sometimes quite hateful tagging and so forth of the crowd. Rather, the aim is to create this mess, where there’s many different
ways through the information as possible including the valuable work done by the professionals. We don’t want to lose that. – [Woman] Okay, this lady over here. – [Woman] Thank you ma’am. I work for a news organization. And one of the things I
wanted to maybe discuss was, you were talking about
curated news versus Wikipedia. And I just read an
article not too long ago that was talking about
how Wikipedia may replace The New York Times as a factual news site. Because it’s more open and curated and it’s not curated. It’s centric, it’s not
coming down from the top. Do you think that that’s a possibility that Wikipedia could
replace curated new sites? And also, one of the
things that I was thinking, in the New York Times, I don’t know if you’ve
noticed recently but the most emailed articles
are the op-ed articles. It’s not so much the fact based articles that people are going
to the news for anymore. It’s more opinionated information. – So I don’t know. I think it would be an
act of foolishness given. You know, if you had asked six years ago, do you think that a Wiki would
ever replace the Britannica? And you’d have to be crazy
to predict anything close to the success of Wikipedia, right. It’s totally out of the blue. To the extent that Wikipedia
has replaced, you know, which it hasn’t. So I don’t know what would
replace The New York Times. Something will. It may be The New York Times. The New York Times would
desperately like it to be The New York Times. The editorial function left
the newsroom a long time ago, about 10 years ago. That for people who are online, which I would say probably all of us, that we’re getting our news from online. We are going to the major news sites. But we’re also getting enormous amount of our news not from those sites. We’re getting them
through the new front page which is continuous, it
comes through our mailbox. It’s on blogs. You know, you read a blog and you get an interesting link and you click on it. It leads you somewhere else. Oh I didn’t know this
about her, about him. And you get an email,
especially in this season. You’re getting dozens every day saying, did you know that John
McCain or Obama or whatever. And you follow that. That’s the new front page. And it has been for 10 years. And it’s curated in this haphazard way. The editorial function is
mainly the recommendation engine that is the internet. It’s mainly your social group. Your extended social network that’s providing the editorial function. For better or for worse. And I think there are reasons
to think it’s for better. There’s also some reasons
I think it’s for worse. Editorial function
already left the newsroom. What’s left there are
investigative reporting, which is expensive and very hard to do. And we don’t have good
sources for doing that yet. General purpose coverage which
generally is not being done except at the local
level by the newspapers rather the wire services are becoming more and more important as the foreign news bureaus get shut down. And the sort of magazine stuff, stuff that magazines do
like Op-Eds and analysis, much of which is tremendously
interesting and important which is why the top articles
in The New York Times are the op-eds and other
feature-like articles. There’s just no telling how
this is gonna shake out. I have zero predictions. And it’s the only realistic
prediction at all. I don’t know what’s going to happen to investigative journalism. I don’t know how we’re going to do that. I don’t know if we’re going to do it which is a pretty scary thought. I don’t know how we’re gonna get coverage. Are we gonna be satisfied with having two or three wire services
that provide the bulk of international coverage? Maybe we will, I don’t know. There’s a certain commodification of news that’s happens so that the coverage stuff, which use to the be the hallmark, now every other website has, they have the feed from the
AP on the right-hand frame. There’s a certain commodification
that’s already happened. This is one of the most interesting, scariest and maybe hopeful
areas that we’re in. I don’t know, I wish I did. You know, the fact that for some of us, the first inclination
is when you want to know about say, Sarah Palin
or some other new topic, is to go to Wikipedia and Wikipedia generally
doesn’t disappoint. Wikipedia generally
has a really, you know, and Fox is all biased
and the liberal media are all biased and there’s Wikipedia and they’re article is
been argued by both sides. And it’s pretty factual
and it’s highly neutral. I’m surprised that’s where I go. But that’s where I go and it, you know, it’s usually pretty good. – [Woman] Actually, I’ve
been thinking about this for quite a while and
maybe you can answer it. It sounds like you’re still accepting that nomenclature is here to stay especially through metadata and data. But in fact, classification
schemes all the way from Linnaeus to Library of Congress to you know, all of the
classification schemes out there are really on their way out. And if not on their way
out have to be changed. Not only philosophically
like Phylogeny or whatever. But also in terms of actual
physical arrangement. The arrangement’s book in
the Library of Congress, you know, the call numbers
and all that kind of stuff. It sounds like in the
future, that’s all gone. But maybe nomenclature will stay. – (sighs) So. People organize things for a reason. They have some interests. In the case of, say what? (faint dialogue from audience) Well yes but, when you organize your CD’s
at home if you have CD’s. (laughing) Or your books or whatever, or your silverware, you’re doing it for a particular reason. You don’t alphabetize your spices because it’s not worth, you don’t have that many. You want to put the
most used ones up front. And so there’s some you know,
reason why you’re organizing. This isn’t a natural organization. When you are organizing
materials without knowing why people are going to
be interested in them, as a library typically has to do. Then you are under this remarkable, you have to engage in this remarkable enterprise of trying to come
up with a classification system that is not driven by interests. We know how to do it by
interest, clarifying things. Oh I want the hot spices first ’cause I use them all the time. That, we know how to do. Coming up with a general
purpose organization of spices, how people are going to, or of books, is extraordinarily difficult. The question is whether, in what cases will the amount
of effort that it takes to do that, be worth it? I don’t know that if we
were starting from scratch building book libraries that we would try to come up with a Dewey system. I suspect not. Or with the Library of Congress system, which maybe we would. But maybe not, I don’t
know if we were to do that, If we were to think
about digital materials and how we’re gonna find stuff. If we would want to come up with that sort of classification scheme. For digital materials, I suspect not. But we do have these things. There’s an enormous amount
of labor and intelligence and value built into them. An enormous amount of
information built into them. It seems to me, foolish
to cast all that aside. We’re in the fortunate
position of the digital world of being able to use as many
multiple schemes as we want. So there may be times
when you want to browse in a Dewey like fashion. That’s great. Maybe you can suck Dewey
into a faceted system and make Dewey more usable. I’m sorry, it’s a complex answer, not very satisfying. First is, there are going to be times when we want controlled vocabularies, when we want to have strict
hierarchical taxonomies because that serves some purpose. Great, that’s more information we can use. Starting from scratch, at times when we use to think
that the only reasonable way of organizing is in that manner. We’re gonna say no, it’s really not. And in many, many instances, whether we have what, even if we have one of these systems and we’re building it, we’re gonna get more value out of it by also opening it up. As the University of
Pennsylvania library system, allow people to start doing
the tagging in addition and having that as a
second or third or ninth degree of organization that’s available. Yes, you want as many
layers as you possibly can, as many ways of sorting and ordering. There’s value in almost all of them. So, why are you dissatisfied
with this answer? (faint dialogue from audience) No is there another question? (audience laughing) Yes, of course! (faint dialogue from audience) But you have to use the microphone. So that we can all learn
what you’re thinking. – [Woman] You led me to rethinking about the Library of Congress
classification scheme and how ineffective it
is in some categories and some subject areas. For instance, in the
scientist, natural sciences, when you’re splitting up an animal instead of having the
whole animal together. Or in art when there’s no
particular classification scheme that works well for photography. Things like that. And so I’m trying to. And you said something
that was helpful to me, this idea of people having adding tags and other layers so that you’re
mixing up the information, it sounds like it might be helpful. – So you see, at eBay, there is an informal taxonomy that controls vocabulary that emerges. So if you want to find laptops, you should look, I can’t
remember which it is now. You should look for portable
computers, not for laptops. Or vice versa, I can’t
remember the example, sorry. And that’s, this is a
strictly folksonomic thing because people are trying
to sell their laptops, discovered that they got
more hits if they called them portables than the laptops. And the next person did and so forth. So there was a sort of
inertia towards laptop. So that’s fine. It doesn’t really matter
which one you call it. But it’s good to have one. And it’s good to know what
the controlled vocabulary is. If you’re gonna sell your laptop at eBay, it’d be nice if eBay told you, well there are 10,000
laptops and a hundred million portable computers so you could
make up your mind which one. Surfacing that sort of metadata
is really really useful. But it’s also useful that
eBay has its own taxonomy that you can scroll
through and it’s actually, I don’t think it’s very good. I find it fairly confusing. Nevertheless, there are
times when you want to use that too because of an ambiguous term. And you wanna get search
only within this one category and so forth. All these things are useful. It’s fantastic that eBay
has all of these things. Amazon is actually an even better example. The layers of sorting
and organizing techniques that they provide is, dozens, literally dozens on any one page. Ways of working through their site. Because sometimes you want to find books that other people like you read. And sometimes you wanna find, you know, the statistically improbable
phrases thing they do. Where they analyze the content, they find phrases that are
well statistically improbable. And they show them to you and they let you search for other books. So if you want to find other books, for some reason you
want to find other books that use the phrase, heathen harpooner. (audience laughing) You can! And if you two people
on Earth ever did that and bought a book, Amazon’s
really really happy. (audience laughing) ‘Cause it didn’t cost them anything. It costs them a couple of CPU cycles, so. Do everything. That’s all we want. You know, it’s very simple. Just do everything. Yes sir. – [Man] It’s hard for me
to get to the microphone. – I will repeat your question. In my own terms. Oh I’m sorry, you’re right on the camera. – [Man] At the beginning of your talk, you mentioned how it’s
become fashionable recently for people to consider DNA as information. And you indicated how you didn’t think that was quite accurate, that it was just these
squiggly bits of stuff. And I’m wondering in what
way does DNA not meet the litmus test for being
considered officially information? – Information is an abstraction. So there are physicists,
there are quantum physicists, who would handily beat me in an argument or possibly to a pulp. (audience laughing) Who maintain that information
is a physical thing and subatomic theory is
understandable if we say that subatomic particles are
themselves information. I’ve read the books. There’s a great one by
Charles Seife, S-E-I-F-E. I don’t understand him and
so there’s a good chance I’m just plain wrong. Okay. But as I understand it, information is founded
on the notion of the bit and the bit is a symbolic element. A bit in a computer is the state
of electricity, of voltage. And it’s just by convention
that we choose high voltages on or one and low voltage is zero. That’s symbolic. That’s why Kurzweil representing
his brain in a computer can’t be Kruzweil. We can do the same thing with beer bongs. I’m not sure who’s example
this is, by the way. Where you hire a 100
billion graduate students, beer bottles, down is off, up is on. You have the 100 billion
neurons modeled that way. You send them instructions, saying okay, neurons, beer
bottles one through seven and you know, give them a
complete instruction set. And so they’re turning their beer bottles up and down like this. In sequence in order to
stimulate Kurzweil’s, the brain state of Kurzweil
when he first saw his wife and fell in love. That’s a symbolic representation. It’s information. It’s not a brain, it’s not consciousness, it’s not experience. Similarly, somewhere there’s a cloud of a 100 million dust particles. And each off in the galaxy somewhere. And each is spinning this way or that way. And if we say, left is
counterclockwise is on and the other is off. Then it’s Kurzweil falling in love. If we however say, no the other way is on, the other way is off, then it’s Kurzweil tasting
strawberry ice cream. Information is symbolic. There’s a huge amount to learn about DNA by considering it as a code. But it is not a code. It’s a squishy molecule. And you can in fact, trace
the metaphors that we use, the framing we use for talking about DNA, back through the history of information. And there’s some very
good books that do this. I can’t remember the name
of the one I’m thinking of. That the genetic mechanism,
DNA was initially talked about in terms of a lock and key or a template. You’ve got R&A is a
template and it reads off. It’s a perfectly sensible way of doing it. You’re talking about a lock and key. Once information theory started, coming out of World War II which includes not just the noise stuff
but the cryptography stuff which is also, that’s where
we get the coding of messages that the action of spies
became the metaphor for communication. That the coding stuff goes
into information theory and comes out the other
side and DNA now is a code. DNA is not a code. It doesn’t have messages. It’s a miraculous molecule
but it’s not a code. We can treat it as a code. And we can learn a lot about it. We can treat it as information. I’m fully in favor of that. I’m not a luddite but
it’s not information. (audience murmurs and laughs) Moving on. (laughing) Thank goodness for Steven Goldair and his. (faint dialogue from audience) – [Man] This presiding
institution is enamored with collection’s care. We’re also enamored
with our own authority. And it seems to me that you’ve articulated connectivity and fallibility as two ways of authenticating authority
in this new world order. And I’m wondering if you
have any other suggestions about the ways in which that
metadata can be cultivated. – I’m gonna answer this slightly, I’m gonna answer a slightly
different question. (audience laughing) On the one hand, we want people, we want institutions
to talk in human voices and acknowledge fallibility. And I’ll stand by that. But I also want to modify it a little bit. In business for example,
we want the same thing. And we also want in particular areas for the business to speak in ways that they’re held accountable for. And I’m not talking about
Enron and that sort of thing, though that too. I mean that when you look
up the specifications for the washing machine you just bought, you want the company to be right. And you assume simply by
the format of the page, because it’s in a table of specifications, that they measured accurately
and if they didn’t, you’re gonna sue them
because it’s not gonna fit in the hole you just sawed. And for that, we want, we don’t want a lot of argument and the comment board afterwards. And authority waiting. We just want it to be right. And it better be right. There’s a commodification not only of news but of facts. And it started, well it
started a long time ago. But you know, the creation of
almanacs did a lot for that. That here are a set of facts that, you’d just look it up. You don’t argue about it. You just look it up and
you expect it to be right. There’s stuff in business that we want. And there’s stuff from
you that we want, as well. So the basic facts that in the Library of Congress example before, what we know about the photo. We expect you to be right about it. And we don’t expect you to burn in hell if you got the date wrong or something. But we don’t want to argue
about it, just be right. And so one of the tasks is to figure out what that level of commodified fact is. And to be bold and brazen about it. Don’t worry about it. We know we’ve got this one right. We know when this entered the collection. We know who painted it, it’s right. But also, then to allow
because you’re gonna get some of that stuff wrong too, of course, to allow mechanisms by
which there in fact can be some feedback and say you know what? I think actually, this was in
my family’s house till 1947. I think you got the
date wrong or whatever. There’s this balance, this mix, this impossible mix to get entirely right that you now, I believe need to be engaged where some of it’s commodified. Even that stuff needs to
have a feedback mechanism and some room for discussion. But don’t be ashamed of it. It’s at the higher levels of interests that the voice of authority becomes more interesting. And the rule of thumb here is I think, is what it is that people
are gonna want to talk about. The stuff that we don’t
want to to talk about, like what year was it painted. That, fine, just state. The stuff that’s more interesting, that we want to hear from experts. We’re gonna value what they have to say. We’re gonna read the
marker next to the sign. And somebody’s gonna
disagree with part of it. And it would be good if there were ways to surface that disagreement, not on the sign, but somewhere. So this is an incredibly mixed up area. There is no one area where
mere authority works. There are times when we just want, just don’t use slang, don’t say, it was like
painted in 1497 man. We don’t want that. We just want, 1497, say it (clapping). And you don’t have to figure that out. I don’t know what the
hints are for doing that. Hi. – [Man] Those of us who are
at some of the museums here, you know, work at institutions
that are considered almost kind of you know,
places of pilgrimage. Once, twice, kind of
three times in a lifetime. Places of visitation. That kind of model seems pretty different than the world you’re describing. So and this is, there’s a couple of things
I’ve been thinking about. So one is, in the context of what
you’re talking about. First of all, I can’t
ask you to generalize. But I guess, what would
you want from a museum from the physical
manifestation of a museum? And the second is, in the
world you’re describing, either physically or you know, online, in my case, I work with online
manifestations and museums. So I’m always thinking about
what’s the relationship between the physical and
the digital manifestation of my institution. So in that context, what do you see is the
difference between museums and libraries as we move into the. – If you forced me to, if you got me good and drunk
and forced me to answer about the future of
libraries, I’d probably say that libraries are becoming museums. In so far as they are libraries. In so far as they’re
collections of books, that is. The social space of libraries in towns, local libraries will evolve
away from museums I think. But the collections area,
feels to me like it’s likely to become more like a museum. Physical museums have to make decisions about what they’re gonna put where and what’s going to be interesting. And you know, curators do
wonderful jobs of doing that. And I understand that’s a very difficult, there’s an art to that. We don’t yet have the
hardware we’re gonna have. All right, I have a prop. I didn’t mean it as a prop
but it happens to be one. (metal zipping) Uh, that’s not the prop. Uh, this is so not worth it. (audience laughing) Oh my God, I lost my prop. Yeah, I had a Kindle
when I started out today. Oh! A Kindle. So this is a book reader and
it’s a pretty crappy one. I’m enjoying it but it’s overpriced. And it’s disappointing. So we don’t even know what we’re gonna have for a book reader. The most basic thing that you would expect in a culture that’s been book based that’s not electronic, to have a truly networked book reader. My big problem with the Kindle is it’s fine for reading books but it’s not really on a network. We don’t have the stuff we’re gonna have. We don’t know what it’s gonna be. We don’t know if it’s
gonna be a blackberry or a wristwatch or a cellphone
or visors and implants. Or we don’t know what it’s gonna be. But presumably, we’re gonna have, so it’s very hard to design
for the hardware we don’t have. The hardware that we
have, that we will have, I assume is going to tend towards ubiquity and connectiveness. So that we’ll always be online
is what I’m trying to say. I have no idea what the
form factor is going to be. But there’s so much value to being online and connecting that to
where you are physically, that that becomes, we assume would become an important part of museum experience. Right now I can rent a headset, a handset, I can listen to experts making
sense of what I’m seeing. And you usually do because
it’s very valuable. But I’m sure you know about
the people creating iPods, you know, MP3’s. They’re own museum tours. You know, it’s this sort of great example of the sort of thing I’m interested in. The ability to interact with
what other people are saying, what they think about. I mean I go online to try
to understand TV shows. What happened in this TV show. That’s a TV show, work of art, a cultural object is so much richer and being able to
surface that, where I am, would be, it seems to me inevitable. I just don’t know how it’s gonna happen. Inevitably, we’re gonna have
physical things arranged. It’s the layer on top of it that, what happens to that will depend upon the delivery mechanism that we have. But I assume it will be social. I assume that people will leave at least semi-persistent
markers and annotations. And that we will socialize
the knowing of it. Okay, so that’s totally unhelpful. My pleasure. (laughing) You know, we
haven’t invented it yet. Oh, sorry. – [Man] Two questions
about searching by subject on the internet in the library catalogs. It doesn’t seem very good right now. Because on the internet,
using a search engine that is, there’s no way to search
by subject that I know of. You can search in title,
colon and you put your title. And library catalogs, when non-librarians go to search by subject,
they just put in their, what they conceive of as the subject without consulting the big red books, the Library of Congress subject headings. So, some of the subjects
are kind of complicated. For instance, there’s, for the therapists for the blind, you use, orientation and
mobility instructors. And I would never know that except that I looked it up this morning. (audience laughing) So. Now should library subject headings be at the basic level as
thought of by Eleanor Rosch, the psychologist? Or should they be complicated
or does it matter? (audience laughing) – (laughing) So. With Rosch, you’re
thinking about her work, the prototype theory and the
various levels of abstraction that we seem naturally to deal with. So the question is should we, I don’t know, let’s start with that. Subject headings are, you’re never gonna get
everybody on the web to use subject headings. They may come along an initiative that proves attractive enough
that there’s some markup that people will start using
to attach subject headings. For example, if Google were to say, we’re gonna start
allowing people to search by subject, colon and then and all you have to do
is put in subject colon, now they won’t do that because, spam. Right, this is why they pay
no attention to keywords, the metadata keywords
that are built into HTML. Nevertheless, it’s conceivable
that somebody will come up with something like that. In fact, it’s conceivable
that we already did and we call them tags. Right, that is in fact,
what tagging is all about. Trying to come up with a single, giving somebody authority
over what the subject of something is, you can do it. And there’s some value there. But it turns out that what
you think something is about even as the author, is less important to me
that what I think it’s about as a reader. And so the reader’s
tagging or subject heading, adding subject headings is
always gonna have more value for users than those of others. The subject headings added
by professionals have value, have more value than just you know, Joe or Jane Blow adding them. Nevertheless, the value is always gonna be capable of being supplemented by others. There’s no way of getting it right, what the subject headings are. So, if you’re gonna add subject headings, I don’t know. It depends what you’re trying to do. So I don’t have a general
answer for the level of specificity for subject headings. I mean the idea of a taxonomy is that you have multiple, it’s hierarchical. You have multiple levels. But then you end up
with the problem you had which is it can be hard to predict. As anybody who’s ever used
the Yellow Pages knows, it can be hard to predict, how the index or how the
cataloger was thinking about it. Which is why it seems to me that, even in those cases where strict hierarchy makes sense and there certainly are places and controlled vocabulary makes sense. It almost always will also make sense to allow it to be
supplementing by user tagging, classification, all the rest
of the stuff we can do now. So in terms of generality
of subject heading, I don’t know. I’m not a librarian, I’m not a cataloger. I’m a writer though. – [Woman] I wonder if you
could comment a little bit on the idea of public space
and private space on the web. For example, Google that we
talk about is a search engine that’s starting to affect
how people put together the metadata on their websites. Yeah, it’s a private company that sells advertisements primarily. Similarly, Facebook is
getting more and more into the advertising game and even Flickr was bought by Yahoo and
now is probably gonna start selling us stuff more than ever. So I wonder if you could
just comment on how this might look moving forward as the web, as a social space, when it has all of these private companies that are trying to get in our wallets. – Yeah, well this is, it’s too bad that it’s, that we have capitalism. And that’s what drives things forward. But that seems to be the case, right. So, and I’m actually okay with capitalism when it comes right down to it. I guess two things. First is that we’re in this, we’re in the process of
redrawing these lines. I understand the National
Archives for example, has privatized the digitizing of archives and is now selling them to citizens. And okay, that’s a new
way of drawing the line. Maybe that will work, maybe it won’t. But we don’t know what works. We don’t know, we’re making this stuff up. So, and Harvard has a secret contract. I mean the fact of the contract is there, but the terms are not public, with Google to digitize its library. And I actually don’t
even know who has rights to which copies where. I mean, I don’t know. Is that a good thing? Harvard only has 37 billion
dollars in its endowment so you know, it wasn’t gonna
spend that on something like digitizing information, making it accessible openly. I mean that’s you know, come
on, let’s get serious here. So was that a good thing? I don’t know. It makes me a little sick
at heart to know that. On the other hand, how are we gonna get Google
except by doing what Google did? By having it compete with the Alta Vista, you know the previous one and make huge amounts of money, unbelievable amounts of
money that it reinvests. How are you gonna build
that except that way? Oh I know how. You do it the way that
Wikipedia was built, for free. Well, could we have done Google for free? Maybe but we didn’t. Maybe… We’re rewriting all of
these things as we go along. And we do, especially those of
us who are in public service, in the public sector, we need, as you are,
you think very carefully about what the implications are of, and Harvard educational
institutions like Harvard as well. We need to think very hard
about the implications of taking what look like
either public resources or things that would have
such tremendous public good if they were made open. And handing them to for profit companies. I wish there were a simple principle. Where we could say no! Never! Never! In this country, we don’t
even have a commitment to, we have documents that are
locked into proprietary word processing formats, Word. And it doesn’t seem to bother anybody. This is up and down. Up and down. And I don’t have a general principle. It makes me queasy at times. – [Woman] I think we have
time for one more question. – Or five if you make them yes, no. (audience laughing) – [Woman] Can you use
the microphone please? – [Woman] I just wanted
to say that this certainly throws a monkey wrench in teaching. Because either we’re teaching the students to look at databases. Let’s say, because we all touch databases when we’re in academic institutions. And we tell them that this
is the authoritative source. Or we tell them you be the judge. What do we do in this case? Because libraries are
spending a lot of money buying database, subscribing to things, and we’re selling them as authoritative. – Just be clear, what, by database. – [Woman] Databases that are
commercially produced, EBSCO or of something comes from
Harvard or from Columbia. You know, like the Avery Index
of Architecture or Journals or I don’t know what comes
out of Harvard actually. But Harvard itself is the credible. (audience laughing) Is the credible institution so. – I’m only a fellow there, I have no. (audience laughing) – [Woman] Well I meant
that libraries actually spent a lot of money
buying things that are authoritative and we teach
them to students as such. – Yeah, well. Obviously, clearly, I believe we want to teach
students to evaluate. That the old days of saying, oh no, here’s the list of
things and go memorize them is over. Because either the list is
so easy to find on Google, there’s no point in memorizing them. Our 17-year old son has in two grades been required to memorize state capitals. Ah! Why! Or it’s not so easy to
look up in which case we want them to evaluate. It’s not a simple list. So, in the public
schools, you know my kids have gone through really
good public schools in Brookline, Massachusetts. And they are told, you may use two sources from the internet for this report. Please go to the library. I mean, they’re never gonna do that. As soon as they leave school,
they’re never going back to the library, they’re just
gonna go out on the internet. And so I would far, far, far
rather have the teachers say two things. First is, go out, bring back your sources and we’re gonna talk about them. And start teaching students in class, here’s how you tell. And here’s why you should
be worried about this one. And why you should believe that one. And the second thing
I want students to be, teachers to be telling students is, in effect and I wouldn’t
quite put it this way but, go write the textbook
when they’re old enough. Okay, the classes we’re
studying Civil War. And we’re in sixth grade. And the class is gonna make
a Wiki about the Civil War. Or I don’t care about Wiki’s but whatever. And you’re gonna put it online and add to the public’s
knowledge and understanding. And you’re gonna do
this public thing okay. And everybody’s gonna contribute. And some of you are gonna
be better at writing reports and some of you are gonna be
better at raising questions or doing comments and that’s fine. We want all of that. But go build this thing as if you were writing the textbook. And the teacher’s involved and looking. That’s what I would far prefer to here’s a textbook, read it, come home, we’ll quiz in the morning. – [Woman] So we should
be judging the students on process rather than anything
else resulting from it? – To a large degree yes. I think that’s right. And in particular, on the
socializing on the process. Your students are sitting at home. Your children, are sitting at home, they’re doing their
homework on the computer which means they have
four IM sessions open and they’re SMSing in between. (audience laughing) And they’re talking about
sports and their friends. But they’re also saying what
did you get for number three? I don’t understand the teacher’s question. Oh, here’s what she meant. And I’m writing this. They’re doing their homework together which is exactly what we want. We’re putting students
into sort of a need to know situation where they’re
not allowed to talk. That’s not how we operate. You have a question, you
want to know something. You go argue with somebody down the hall. And we’re not preparing
them for that at all. They’re doing their homework socially. And we’re still testing
them with their heads down on blue books. Which means we have nothing to do. And it’s with how we actually learn. The opportunity for
socializing learning now. And it’s making them more
cynical about school. No child left behind. (audience laughing) – Well this has been a
really remarkable session with lots of interesting
discussion and questions. I’d like you all to join me once again and thanking our speaker
for his presentation. (audience clapping)

Leave a Reply