Information Anarchy: A Survival Guide for the Misinformation Age – SANS CTI Summit 2018


(humming) (audience applauding) – So, my talk is on
information anarchy, and since basically nothing
Scott said about me was true, I’ll go ahead and tell you a
little bit more about myself. The most important thing, I
am totally not an anarchist. Which you might not
believe if you had seen my browser history lately and
all of the anarchy clip art I had to look up trying
to make this presentation. It’s actually, anarchy
clip art is a real thing and I couldn’t use any of it because it was on the
other side of the line of what I think I could
have gotten approved. I am a SANS instructor. I wrote a book with Scott called Intelligence-Driven
Incident Response, and I am a player of
random musical instruments. You all say my ukulele skills. I also play the steel drums, and there was a brief
stint in high school where I was part
of a kazoo band. It ended poorly. Creative differences,
and it turns out kazoos are terrible instruments. But the most important
thing about my biography for this talk is that I run
threat intelligence for Rapid7. And if anyone’s
familiar with Rapid7, we have a lot of
different tools we use. We have a vulnerability scanner, incident detection and response, services and consulting. We have a medical
framework that we use for penetration testing. And so, I kind of
coordinate and liaise with all of these
different organizations. The goal of the threat
intelligence I provide to them is to get the right information
to the right people on time, and that’s really,
really important to me. It got really, really
difficult in 2017 to achieve that goal,
and that’s because people would come to me
usually maybe once a month and they would say
hey, I heard this thing and it sounds really scary
and what do I need to do? And so I would take
the time to figure out what is this thing, where
did they hear about it, and what can they do? And a lot of times, it
turned out that that wasn’t actually based
on accurate information. And once a month turned
into once a week, turned into once a
day, and my entire job was now being
spent tracking down misinformation and
bad information and things that
people didn’t know where they heard it from, but they’re really,
really scared, they need to know what to do. And I actually, Chris’s
presentation earlier gave me the words to describe
what I was going through. I was using all of my
cognitive processing on tracking down
bad information, and I was mentally exhausted. It got to the point where
somebody would send me a link, any link, in Slack, and I
would be like, ugh, this again, and I would be angry. I decided I had to do
something about that. Plan A was I was
going to just stop and I was going to go
raise chickens somewhere, but my HOA doesn’t allow
us to have poultry, so I had to come up
with a better plan. All this pain led to the
research we started doing on misinformation,
how do I identify it, and how do I deal with the
current state that we are in, which I deemed
information anarchy? What is information anarchy? It’s a term that I first heard as part of JDP 2-OO, which
is the British military intelligence doctrine,
and they said that information anarchy is a state where you have
increasing amounts of
information coming in, and an increasing
lack of control or information about
where it’s coming from or whether it’s valid or not, and that leads to a state where it actually makes more
difficult for people to make decisions based
on this information, because you don’t
know what’s important, you don’t know what’s relevant, you don’t know what
you need to act on. How did we get here? It’s not a new term and
it’s not a new concept even in information security. There was a paper in 1995
written by a gentleman named Don Parker
called A New Framework for Information Security to
Combat Information Anarchy. 1995. I wish I had been paying a
little bit better attention but I was really caught up with
my kazoo band at that time, but 2018, we still
have this problem, we still have this issue. We came upon it
pretty gradually. For a brief history,
my brief history of the state of
information in our world, we started out with
information monarchy. One person, one organization,
often one government controlled all of
the information. They decided what people heard, they decided when it went out, and they weren’t
really interested in any sort of feedback. Unfortunately, there
are a lot of countries and a lot of regimes
that still exist in this type of
information state. After that, we started to
see information revolutions. People started
getting more educated. They realized, wait a minute,
this stuff you’re telling me is not necessarily the truth. It is not the end-all, be-all. I need more, I want more. Luckily, in a lot of cases,
they were successful, which brought us to
information democracy. We now have multiple places we
can get our information from. We can seek it out ourselves, we can watch the news and
have people tell it to us, we can read peer
review journals, and while I flashed
I’m not an anarchist, I should be saying, yay,
information democracy, that’s the best. But it had its problems as well, and one of the problems
is that people started, the news people who were
creating this information, started competing for readers. You want ratings, where
people spend their time and their money, that
is where they wanted the readers to be,
so we started seeing this fight for ratings,
and because of that we started seeing
people doing things like posting sensationalized stories, and the fear, uncertainty,
and doubt, and the FUD in order to get those ratings and become the best news source. So that was bad enough. But then, we had social media. And social media
basically allowed anybody to be their own news source. You can post anything you want. It doesn’t have to be real,
it doesn’t have to be right. You can literally post anything. Like I said, try it. Please, it hurts my
heart, but after that, it becomes so hard to know
what is a real news source? What is a new news source? What is the blog
somebody is running but it sounds legitimate? And that brought us to our
state of information anarchy where we have so
much information. Some of it’s good,
some of it’s bad, and we just really
don’t often know a good way to find out, and
that’s where I was spending all of that cognitive
processing time, was trying to sort out
what’s good from what’s bad. Throughout this research,
we realized it isn’t just good information
and bad information. There’s lots of different types
of misinformation out there. These are some broad categories that I identified
through our research. There’s lots of
subcategories throughout it. But the first one is
innocent mistakes. Sometimes, some people
just get it wrong, and it’s not intentional. They’re following
their processes. The same happens to us
in intelligence analysis. We can do our best job, we
can make our best guesses, we can use all the tools
and resources we have, and sometimes it’s
still just not right. When this happens, though, and this used to happen
a lot more often, it needs to be corrected
when you identify that. If new information
comes in or you realize you made a mistake, that
it needs to be corrected, and unfortunately
what we’ve seen is that even when
it is corrected, it’s really hard to know that
something you read yesterday is no longer accurate. There is a really
good resource I found. It’s a website
called newsdiffs.org, N-E-W-S-D-I-F-F-S dot org, and they, it’s a project
where they catalog about a handful of the
most common media outlets, like the New York Times and
the Wall Street Journal, and they will provide updates on when articles
have been changed, so you can see the
changes side by side so you know when new
information has been introduced. But it is actually
really, really hard even to go in and
identify when somebody did make this kind of innocent
mistake in their reporting. The next one, I see this a lot, and I’ve determined that a
lot of my heartache in 2017 was because of this
type of misinformation. Hypothesis as a fact. A lot of these attacks we saw, WannaCry2 and NotPetya
and things like that, the commentary and
the information that was out there about
it was people trying to do their analysis
in real time, often over social media
or in the actual press. They might have been good ideas. They were definitely
something that somebody should continue to
research and find out what the outcome was, but
when they make it to the news and they make it to
something people are reading, it is really hard to
know whether or not you should act on it, because
they don’t actually tell you that this isn’t final. This is a problem
because people rush. When these new breaking
attacks come out or these new vulnerabilities
are announced, we want to get
information out quickly, but that quick is often
compromising our ability to provide accurate information. And it’s not always intentional. I know I’ve have times
when something happens and someone calls me and
says hey, what’s going on? And I say, all right, well,
we’re still investigating. I don’t have all the details, but it looks like
it was X, Y, and Z. I’m pretty sure, then I read it, and somebody’s published that and they’ve cut out
all of my qualifiers, and it just says Rebekah
Brown says X, Y, and Z, and that’s really
unfortunate because now I’m contributing
to the confusion. The third category is
something I think we see a lot, and this is the only time in
my talk I’m gonna say this, when people talk
about fake news. It’s someone pushing an agenda. This is essentially everything that crazy aunt you
have posts to Facebook, where you’re like really? You know that’s not true. But it’s something that confirms
somebody’s existing biases. A lot of the cognitive
biases we talk about in intelligence analysis
where we will find reports full of things like that. And it comes in multiple forms. It can be either somebody
trying to push an agenda of I want people to be
scared about their security, I want them to think
they’re vulnerable because then they’ll call us and they’ll need more
security services, or it can be somebody saying this is the political
agenda I want to push, and so I’m going to
pull everything out and formulate this news in a way that it is going
to encourage people to believe my way of thinking. And the fourth category is
intentional disinformation. We actually see
this pretty rarely in information security news. A lot of times what
we see are the second and third level effects of
a disinformation campaign, but it’s pretty rare that
there is formal active measures behind the things we respond to, and that’s because these are, Thomas talked about some
active measures this morning. There’s a really good
quote I like from a colonel who’s in the East German
foreign intelligence service, he’s in charge of
their disinformation, and he says, our friends
in Russia call it (speaks in foreign language), our enemies in America
call it active measures, and as for me, I call
it my favorite pastime. Disinformation is an operation. This image comes from
Operation Infection, which was a Russian
disinformation campaign that they ran in the
80s, and they convinced, they started
inserting information and started conjecture and
getting people to think that the AIDS epidemic
that was being experienced in the U.S. was actually
U.S. biological weapons. We say that now and it
doesn’t make a lot of sense, but if you look at what
was going on at the time, we had just come out of Vietnam where biological
weapons were used. There were prisoners of war who
had been captured in Vietnam who gave coerced
confessions saying that yes, the U.S. is employing
nuclear weapons, and during the Cold
War, when we’re trying to increase military spending, it was probably in
Russia’s best interest for the U.S. public to
not trust their government and not trust their military. This campaign was very involved. It involved thinking research
and scientific journals and all these sorts
of activities, so misinformation
is not trivial, or disinformation is not
trivial, and we do see it. Like I said, we’ll see second
and third level effects, but in most cases, if
somebody’s just publishing a dumb article that
I have to respond to, it’s not disinformation,
it’s misinformation. All right, so how
can we identify it? It’s good to have a better
idea of what misinformation is and what kind of
categories things go into, but I still need
to free up my time. I still need to be
able to have shortcuts, they’re gonna help
me get through all of this bad
information so I can focus on what’s really
important to my job. There’s a couple
different techniques that I’ve come up with. Are there any
Zombieland fans in here? What’s rule number one? Cardio. Well, since Alex Pinto
has already claimed cardio is rule number one
from machine learning, I’m going to go ahead
and say rule number one for misinformation is sourcing. Where does the
information come from? Identifying the source
and being able to know whether or not
it’s a valid source is going to cut all
that information you have to respond
to by about half. When we were
responding to WannaCry, and trying to identify
what was going on, as we kind of worked
through this little war room across all of Rapid7,
I had one person who kept saying, I
have IOCs, I have IOCs, and they would send
me this list in Excel of some IPs and
domains, and he’s like, we need to push those
in for detection, and I’m like where,
what’s going on? One of the guys I
work with jokes that I am the great intel firewall, ’cause nothing goes into
our detection systems unless I approve it,
and that often involves knowing the source and
me being comfortable with the sourcing. When I finally got this
person to explain to me where he got those IPs from, his response was, literally,
Twitter, word of mouth, and probably some other places. (audience laughing) I don’t have, that
made it pretty easy for me to be like, no. And turns out, it was
a separate campaign, it was the (mumbles) campaign that had been running
at the same time, but the unfortunate thing is, if you look through a lot of
threat intelligence tools, especially aggregators,
you will see those IPs and those domains listed
as tied to WannaCry, because once something’s
on the internet, once something’s on Twitter,
if you’re mining Twitter, you’re gonna get the bad
information with the good. Even though we are
able to identify that that was not linked,
if you go look it up today, you will have a really hard
time figuring that out. You would have to go back to
the actual source of the source who later corrected it,
and said just kidding, totally different thing. It’s not that easy to do. When we’re looking at sourcing, the things you want to know are where did the
information come from? A lot of times, like
I said that itself will be like nope,
not acting on this. The next thing is can you
access the source material? If they say, oh it came from
a report from CrowdStrike. Okay, cool, can I
actually see that report? Can I get that report
to validate, to verify, to answer any follow up
questions that I have? A lot of times,
the answer is no, and that makes it really,
really difficult to do your job, but if you are able to
see the original source, who is being cited, what
report, what analysis, what Twitter post in some cases, that will give you
a lot better idea of whether or not it’s something
that you can follow up on. And then finally, was a
structured analytic method used when you read through this,
whoever they’re talking about, where this
information came from? Did they go through
some sort of process to get from point A to point B? Carmen spoke about how
structured analytic techniques are not always necessary
and that’s absolutely true, but I want to know that
there was some sort of analytic process
that happened to get to the information I’m
being asked to respond to. The next thing we
looked at after sourcing was linguistic methods. One of the things I
wrote up in my bio is that I’m writing
my Master’s thesis, and my degree is in
homeland security with a cyber security focus, and a graduate certificate
in intelligence analysis, so I take a lot
of weird classes, and one of my favorites
was actually called intelligence
profiling of leaders, which, I read intelligence
profiling, and I was like whoa, pretty sure we’re not
supposed to do that, but when you’re
talking about leaders it was actually really
helpful to understand how these people who are
making decisions operate and how they think and
what motivates them. And they used things like
leadership trade analysis and sentiment analysis and
motive imagery analysis to look over texts
that people speak about to understand more about
what they’re saying. I decided to try and apply
some of those methods and we had to tweak obviously
some of the word lists and things to make
it more applicable to information
security research. But I ended up after looking
through about 100 articles, some good, some really,
really bad, like I feel, I don’t know, man,
I wish I hadn’t read some of those blog posts, but I did it for science,
and I came up with four lists of things to look at,
look for in an article. The first is words of sourcing. Things like
according to, as per, they’re going to tell me that
this text that I’m reading was based on something. The next thing I looked for
was words of uncertainty. If you’ve done a lot
of intelligence work and you’ve written
intelligence reporting you know those words
of uncertainty are
really important. Those are things like possibly,
and could be, and might, that show kind of
how confident we are and when something’s a fact
and when it’s an assessment. The next thing I looked for
were explanatory phrases. Things like because
and therefore that show that they’re
doing some of that analytic explaining and
talking through their process and not just stating facts
with nothing to back it up. And then the fourth
thing was retractors. Those are words like
but, however, although. And retractors
serve two purposes. Sometimes, retractors
actually show complexity, which we’re going
to talk about next, meaning that
somebody can identify that there’s more than
one side to a story, or there might be concerns,
and they can address them as part of their own
analysis, and those are good, and then there’s
retractors where people kind of want to give
themselves a way out. They could be like, that was
totally China who hacked those but who do I know,
it could be anybody. You’ve made this very
blatant statement and then you’ve kind of
tried to walk yourself back. People are gonna remember
that blatant statement and then it turns
out you’re wrong, you can go back and be like, hey, I said, what do I know? Retractors are
often used as a way of letting yourself state
something sensational without having to be
accountable for it later. So like I said, we did this
across 100 different documents, refined our word
list, had some issues. May was one of our
words of uncertainty. We found huge spikes in
reports that came out in the month of May, so we
had to do some tweaking. I have a great
data science team, so I’m very, very thankful
they helped me with those, but for our case study, we
took seven different articles about the DNC hack. Five had some pretty
blatant misinformation, and bad information. Two were pretty, we view
them as these are objective, they’re conveying the facts,
and we kind of ran them through just to see what it looked like. What we found was a couple
different interesting things. Our explanatory
and our retractors were kind of
basically consistent. And this was, for anybody
who does data science, this was normalized by
words in the articles so we did a percentage
of words in the article rather than just a raw count, but what was interesting
was that the sourcing and the uncertainty
was all over the place. Some articles use a lot,
some articles use none. The first thing that jumped out when we used this first
analysis right here was our words of uncertainty
in this article, zero. And it was not a short article. You’re telling me there’s
not a single possibly, or might, or any room
for chance in that, so that jumped out
to me right away as something to look into. In addition to just
kind of looking at what they look
like in general, we took a look at each
article’s profile, so each of those
categories of their words across the whole
article, and again, a couple of things jumped out. The ones here that are
kind of more balanced, and they definitely have
some but no huge spikes, these were our good reports. Right off the bat, that told
me that okay, you know what, that balance and
that middle line, there’s something to that. Again, we have our one to
zero words of uncertainty, tons of sourcing, our
IVN, which stands for Independer Voting
Network and that hey, they’re saying they’re independ, and so I’m trusting
that they’re objective, but they had basically none. They had very, very small
counts of any of those words. We generated a heat map
because we love heat maps, and what we saw were
right around here,
this is the midline. Kind of that orange to
pinkish purple color, and so our two control documents were kind of more consistent
and more in the midline. Here’s our IVN report there
with basically nothing, and then we had a couple
of different reports with things like high
sourcing but not much else, and we could kind of start
to see the fingerprint. What does a good piece
of information look like versus something I know is bad? Again, I’m not looking
for perfection, I’m looking for shortcuts. I’m looking for something
that’s going to help me quickly identify whether or
not this is worth my time. We found a couple
of trends by then going back in and reading
more about the document or more about the articles
and looking for where it fit in those different categories, so we found a couple of trends. The first one, in documents
where they had high sourcing and high words of uncertainty, those tended to be more of
those hypothesis as facts. There was a source,
somebody was willing to talk to somebody, they
could say who was being cited, but there were lots of
high words of uncertainty. It could be, might be,
we’ll see, we’ll find out. Looking for that
particular pattern is a good way to identify
that type of misinformation. We also saw high
words of sourcing and then low uncertainty
and low explanatory fit the pattern of
pushing an agenda, and what we found when we went
back to look in more detail was that they had high sourcing but the sources were things like according to an
anonymous source, or according to somebody
close to the information and I’m like, wait a minute, you’re saying the right
words to make me think that you’ve talked to
somebody about this, but you’re not giving
me any information. The next stage of our research, we’re going to
start applying tags, so when we look for
words of sourcing we’re going to try and identify whether it’s a
person’s name versus somebody close to the President, ’cause let’s face it,
we’re all here in Bethesda. Compared to most of the world, we are relatively
close to the President. I feel like we can go ahead
and start citing things. The next one we found
is low everything, and this was our IVN report and a couple of
other opinion pieces. This is the profile of an
opinion piece pushing an agenda. There’s not a lot of sourcing ’cause it’s somebody’s
own opinion. There’s not a lot of
words of uncertainty because they are certain
of their position. Not a lot of explanations,
not a lot of retractors. When you see that low pattern, I would not even really
waste my time on that because that’s likely
to be opinion, not fact. And then none of the
information in our sample fit this profile, but we
started looking out for okay, what does disinformation
actually look like? We had to go back and
pull different documents from the CIA archives
that had been published and we found in a lot of cases that information that
was being sent out as kind of propaganda
and disinformation had high words of uncertainty
and high retractors and again, they want to be
able to back themselves out. All they want to do is sew those
seeds of doubt in your mind and there was low sourcing
and low explantation. Again, that’s just
something to look for. Like I said, we’re
still kind of still tweaking our algorithms. We’ve already got more
things we want to look for, but we want to make this
mechanism that we use to do this quick
fingerprint available just in case it does
help other people kind of understand how to handle a lot of the information
that’s out there. The problem we found with
information security, though, is that that model really
only works with text, like with news articles
and media reporting and even long form
intelligence reports, we saw the same profiles. But it doesn’t work really well for things like malware analysis or technical blogs or
vulnerability disclosure, so what we started
looking for there is something called
integrative complexity, which is a score
from one to seven that looks at how
complex, not the content, but the way that
words are put together and the way that
they are structured. It looks for two things. Differentiation, how
well can I identify that there’s more than
one possible answer or outcome or consideration,
and then integration, which is how well
can I piece together different information and
then draw those connections between different things. It is really hard,
really, really hard, to automate
integrative complexity. A lot of people have tried. There’s some documentation
out there about how to do it, but the recommended
way to handle it is still like hand coding, which is not going to save me
any processing power there. What I started doing with this is coming up with
some rules of thumb for the very simple versus
the very complex documents with the idea that the
more complex it is, the more likely it has been
well researched and thorough. Some tips for that are
when you start to see words like just, or
always, or never, that’s the indication of a
more simplistic viewpoint. Look for synthesis and
multiple data sources. That’s when we talk
about that integration. If somebody’s taken more
things into account, they’ve probably done some
more thorough research. Look for counterpoints
or arguments to be preemptively addressed, especially with a lot
of things that we do. There are going to
be counterpoints, and if there’s an analyst
who can identify those and say well yes, we
know that sometimes people run this against
this type of system and so we would expect
to see this sort of thing but in this case, we saw, when they’re doing
that type of analysis I have a little more faith
that it’s worth my time. And then look for complexity
across the entire text, not just their
area of expertise. This is a big problem I found as kind of a result of
doing this analysis, which is that any time we
talk about vulnerabilities, we find big problems or attacks
or people aren’t patching, our analysis of
information security, our domain where we’re
experts, super complex. We can see all the
different components, we can articulate
all the components, we know how they’re all related, and then when we say,
what needs to be done? Well, just patch. Jeez, you’re dumb, why
didn’t you already fix that? We cannot reach that
same level of complexity outside of information security to tell people how to
address the problems. That’s a whole another
line of research, but it’s just something
to be aware of, that you want that complexity
across the whole spectrum, not just one particular area. All right, so how do we survive, how did I find out
how to survive in the
misinformation age? Sourcing, look for where
the information came from. That’s going to be
your number one way of weeding out bad
information right there. Number two, content,
what are they saying? Look for some of those tips like according to an anonymous
source close to the President. Look for high words
of uncertainty. Like I said, you want
some, but you don’t want the whole thing to be a
high word of uncertainty or that could give
you an indication that it’s not actually final
research or final information. And then look at the structure. Like I said, any
time somebody says always or never or just,
I’m immediately suspicious. Sometimes, they are
good cases for that, but it is kind of a
good flag to look for. And then finally,
let’s not just survive. Let’s change things. Let’s not make it
more and more piled on of bad information. A lot of us in this
room, we have the ability to create content. We provide input to media,
to information report, to infosec reporting, to blogs. Make sure that you
are taking the time to do good analysis, to
identify your sources and validate them before you
start putting information out. It’s great if you go back
and fix it after you need it, but the best way to make sure people are getting
the right information is to put it out there
and take the extra time to be thorough. Let’s raise the bar. And if that requires
raising a little hell and turning the current
way that we’re doing things on its head, I’m
okay with that, too. Like I said, totally
not an anarchist. If you want to learn
more, we started putting our basic research
that we’ve been doing on the thematic content analysis
into my github repo there. These are some additional
documentations, they’re not necessarily
CTI documentations except for the very bottom one. That’s actually a really
good current report on countering Russian
disinformation. But it’s some different
good things you can look at to better understand how
we can solve problems by looking outside of our
own domain of expertise. All right, I am out of time, but thank you so
much for being here. Thank you for staying, I
really, really appreciate it, and I hope to see you
all back next year. (audience applauding) (tense music)

1 Comment

  1. JonBays Sycomp

    December 17, 2018 at 12:45 am

    Really great information thanks

Leave a Reply