Federated Learning: Machine Learning on Decentralized Data (Google I/O’19)

[MUSIC PLAYING] EMILY GLANZ: Hi, everyone. Thanks for joining us today. I’m Emily, a software engineer
on Google’s federated learning team. DANIEL RAMAGE: And I’m Dan. I’m a research scientist
and the team lead. We’ll be talking to day about
Federated Learning– machine learning on decentralized data. The goal of federated learning
is to enable edge devices to do state-of-the-art machine
learning without centralizing data and with privacy by
default. And, with privacy, what we mean is that we have an
aspiration that app developers, centralized servers, and
models themselves learn common patterns only. That’s really what
we mean by privacy. In today’s talk, we’ll talk
about decentralized data, what it means to work with
decentralized data in a centralized fashion. That’s what we call
federated computation. We’ll talk a bit about
learning on decentralized data. And then we’ll give
you an introduction to TensorFlow Federated,
which is a way that you can experiment with federated
computations in simulation today. Along the way, we’ll introduce
a few privacy principles, like ephemeral reports,
and privacy technologies, like federated model averaging
that embody those principles. All right, let’s start
with decentralized data. A lot of data is
born at the edge, with billions of phones and
IoT devices that generate data. That data can enable better
products and smarter models. You saw in yesterday’s
keynote a lot of ways that that data can
be used locally at the edge, with
on-device inference, such as the automatic captioning
and next generation assistant. On-device inference offers
improvements to latency, lets things work offline, often
has battery life advantages, and can also have some
substantial privacy advantages because a server
doesn’t need to be in the loop for
every interaction you have with that
locally-generated data. But if you don’t have
a server in the loop, how do you answer
analytics questions? How do you continue to improve
models based on the data that those edge devices have? That’s really what we’ll be
looking at in the context of federated learning. And the app we’ll
be focusing on today is Gboard, which is
Google’s mobile keyboard. People don’t think much
about their keyboards, but they spend hours
on it each day. And typing on a mobile
keyboard is 40% slower than on a physical one. It is easier to share
cute stickers, though. Gboard uses
machine-learned models for almost every aspect
of the typing experience. Tap typing, gesture typing
both depend on models because fingers are a little
bit wider than the key targets, and you can’t just
rely on people hitting exactly the right keystrokes. Similarly, auto-corrections
and predictions are powered by learned
models, as well as voice to text and other aspects
of the experience. All these models run
on device, of course, because your keyboard
needs to be able to work offline and quickly. For the last few
years, our team has been working with
the Gboard team to experiment with
decentralized data. Gboard aims to be the best and
most privacy forward keyboard available. And one of the ways that
we’re aiming to do that is by making use of an on-device
cache of local interactions. This would be things like touch
points, type text, context, and more. This data is used exclusively
for federated learning and computation. EMILY GLANZ: Cool. Let’s jump in to
federated computation. Federated computation
is basically a MapReduce for
decentralized data with privacy-preserving
aggregation built in. Let’s introduce some
of the key concepts of federated computations using
a simpler example than Gboard. So here we have our clients. This is a set of devices– some things like cell phones,
or sensors, et cetera. Each device has its own data. In this case, let’s imagine
it’s the maximum temperature that that device
saw that day, which gets us to our first
privacy technology– on-device data sets. Each device keeps
the raw data local, and this comes with
some obligations. Each device is responsible
for data asset management locally, with things
like expiring old data and ensuring that the data is
encrypted when it’s not in use. So how do we get the
average maximum temperature experienced by our devices? Let’s imagine we
had a way to only communicate the average
of all client data items to the server. Conceptually, we’d like
to compute an aggregate over the distributed data in a
secure and private way, which we’ll build up to
throughout this talk. So now let’s walk
through an example where the engineer wants to
answer a specific question of the decentralized data,
like what fraction of users saw a daily high over
70 degrees Fahrenheit. The first step would
be for the engineer to input this threshold
to the server. Next, this threshold
would then be broadcast to the subset
of available devices the server has
chosen to participate in this round of
federated computation. This threshold is then compared
to the local temperature data to compute a value. And this is going
to be a 1 or a 0, depending on whether the
temperature was greater than that threshold. Cool. So these values would
then be aggregated using an aggregation operator. In this case, it’s
a federated mean, which encodes a protocol for
computing the average value over the participating devices. The server is responsible
for collating device reports throughout the round and
emitting this aggregate, which contains the answer to
the engineer’s question. So this demonstrates our
second privacy technology of federated aggregation. The server is combining
reports from multiple devices and only persisting
the aggregate, which now leads into our first privacy
principle of only an aggregate. Performing that federated
aggregation only makes the final aggregate
data, those sums and averages over the device reports,
available to the engineer, without giving them access to
an individual report itself. So this now ties into
our second privacy principle of ephemeral reports. We don’t need to keep
those per-device messages after they’ve been
aggregated, so what we collect only stays around
for as long as we need it and can be
immediately discarded. In practice, what
we’ve just shown is a round of computation. This server will repeat
this process multiple times to get a better estimate
to the engineer’s question. It repeats this multiple times
because some devices may not be available at the
time of computation or some of the devices may have
dropped out during this round. DANIEL RAMAGE: So
what’s different between federated computation
and decentralized computation in the data center with
things like MapReduce? Federal computation
has challenges that go beyond what
we usually experience in distributed computation. Edge devices like phones tend
to have limited communication bandwidth, even when
they’re connected to a home Wi-Fi network. They’re also intermittently
available because the devices will generally participate only
if they are idle, charging, and on an unmetered network. And because each
compute node keeps the only copy of its
data, the data itself has intermittent availability. Finally, devices
participate only with the user’s permission,
depending on an app’s policies. Another difference is that
in a federated setting, it is much more distributed
than a traditional data center distributed computation. So to give you a sense of
orders of magnitude, usually in a data center,
you might be looking at thousands or maybe
tens of thousands of compute nodes, where
this federated setting might have something like a
billion compute nodes. Maybe something
like 10 million are available at any given time. Something like
1,000 are selected for a given round
of computation, and maybe 50 drop out. That’s just kind of a
rough sense of the scales that we’re interested
in supporting. And, of course, as
Emily mentioned, privacy preserving
aggregation is kind of fundamental to the
way that we think about federated computation. So when you posed this
set of differences, what does it actually
look like when you run a computation in practice? This is a graph of
the round completion rate by hour over the course of
three days for a Gboard model that was trained in
the United States. You see this periodic structure
of peaks and troughs, which represent day versus night. Because devices are only
participating when they’re otherwise idle
and charging, this represents that the peaks
of down completion rate are when more devices
are plugged in, which is usually when
they’re charging on someone’s nightstand as they sleep. Rounds complete faster when
more devices are available. And the device availability
can change over the course of the day. That, in turn, implies a
dynamic data availability because the data itself
might be slightly different from the users
who plug in phones at night versus the day,
which is something that we’ll get back to when we
talk about federated learning in particular. Let’s take a more in-depth
example of what a federated computation looks like– the relative typing frequencies
of common words in Gboard. Typing frequencies are actually
useful for improving the Gboard experience in a few ways. If someone has typed
the letters H-I, “hi” is much, much more likely
than “hieroglyphic.” And so knowing those
relative word frequencies allows the Gboard team to
make the product better. How would we be compute these
relative typing frequencies as a federated computation? Instead of the engineers
specifying a single threshold. Now, what they
would be specifying is something like
a snippet of code that’s going to be running
on each edge device. And in practice, that will often
be something that’s actually in TensorFlow,
but for here, I’ve written it as
Python X pseudocode. So think of that device
data as each device’s record of what was typed in recent
sessions on the phone. So for each word in
that device data, if the word is in one of
the common words we’re trying to count, we’ll
increase its count when the local device updates. That little program is what
would be shipped to the edge and run locally to
compute a little map that says that perhaps this phone
typed the word “hello” 18 times and “world” 0 times. That update would then
be encoded as a vector. Here, the first
element of the vector would represent the
count for “hello” and the second one for
the count for “world,” which would then be
combined and summed using the federated aggregation
operators that Emily mentioned before. At the server, the engineer
would see the counts from all the devices that have
participated in that round, not from any single
device, which brings up a third
privacy principle of focused collection. Devices report
only what is needed for this specific computation. There’s a lot more richness
in the on-device data set that’s not being shared. And if the analyst wanted
to ask a different question, for example, counting a
different set of words, they would run a
different computation. This would then repeat
over multiple rounds, getting the aggregate counts
higher and higher, which in turn would give us
better and better estimates of the relative frequencies
of the words typed across the population. EMILY GLANZ: Awesome. Let’s talk about our
third privacy technology of secure aggregation. In the previous example,
we saw how this server only needs to emit the sum of
vectors reported by the devices. The server could compute this
sum from the device reports directly, but we’ve
been researching ways to provide even
stronger guarantees. Can we make it so the
server itself cannot inspect individual reports? That is, how do we enforce
that only in aggregate privacy principle we saw from before in
our technical implementation? Secure aggregation is
an optional extension to the client/server protocol
that embodies this privacy principle. Here’s how it works. So this is a simplified
overview that demonstrates the key idea
of how a server can compute a sum without being
able to decrypt the individual messages. In practice, handling phones
that have dropped partway is also required
by this protocol. See the paper for details. Awesome. So let’s jump into this. Through coordination
by the server, two devices are going to agree
upon a pair of large masks that when summed add to 0. Each device will add these
masks to their vectors before reporting. All devices that
are participating in this round of
computation will exchange these zero-sum pairs. Reports will be completely
masked by these values, such that we see that
these added pairs now make each individual report
themselves look randomized. But when aggregated together,
the pairs cancel out, and we’re left with only
the sum we were looking for. In practice, again,
this protocol is more complicated
to handle dropout. So we showed you what you can
do with federated computation. But what about the much more
complex workflows associated with federated learning? Before we jump into
federated learning, let’s look at the
typical workflow a model engineer who’s
performing machine learning would go through. Typically, they’ll have
some data in the cloud where they start training and
evaluation jobs, potentially in grids to experiment with
different hyperparameters, and they’ll monitor how well
these different jobs are performing. They’ll end up with
a model that will be a good fit for the
distribution of cloud data that’s available. So how does this
workflow translate into a federated
learning workflow? Well, the model
engineer might still have some data in the
cloud, but now this is proxy data that’s similar
to the on-device data. This proxy data might be useful
for training and evaluating in advance, but our
main training loop is now going to take place
on our decentralized data. The model engineer
will still do things that are typical of a
machine learning workflow, like starting and
stopping tasks, trying out different learning rates or
different hyperparameters, and monitoring their performance
as training is occurring. If the model performs well on
that decentralized data set, the model engineer now has
a good release candidate. They’ll evaluate this
release candidate using whatever validation
techniques they typically use before deploying to users. These are things you can do
with ModelValidator and TFX. They’ll distribute this final
model for on-device inference with TensorFlow Lite
after validation, perhaps with a rollout
or A/B testing. This deployment
workflow is a step that comes after federated
learning once they have a model that works well. Note that the model
does not continue to train after it’s been
deployed for inference on device unless the
model engineer is doing something more advanced,
like on-device personalization. So how does this federated
learning part work itself? If a device is
idle and charging, it will check into the server. And most of the time,
it’s going to be told to go away and come back later. But some of the time, the
server will have work to do. The initial model as dictated
by the model engineer is going to be
sent to the phone. For the initial model, usually
0s or a random initialization is sufficient. Or if they have some of
that relevant proxy data in the cloud, they can also
use a pre-trained model. The client computes an
update to the model using their own local training data. Only this update is
then sent to the server to be aggregated,
not the raw data. Other devices are
participating in this round, as well, performing their own
local updates to the model. Some of the clients may drop out
before reporting their update, but this is OK. The server will aggregate
user updates into a new model by averaging the model
updates, optionally using secure aggregation. The updates are ephemeral and
will be discarded after use. The engineer will be
monitoring the performance of federated training
through metrics that are themselves aggregated
along with the model. Training rounds will
continue if the engineer is happy with model performance. A different subset of devices
is chosen by the server and given the new
model parameters. This is an iterative
process and will continue through many training rounds. So what we’ve just described is
our fourth privacy technology of federated model averaging. Our diagram showed
federated averaging as the flavor of
aggregation performed by the server for
distributed machine learning. Federated averaging
works by computing a data-weighted average
of the model updates from many steps of gradient
descent on the device. Other federization optimization
techniques could be used. DANIEL RAMAGE: So
what’s different between federated learning
and traditional distributed learning inside a data center? Well, it’s all the
differences that we saw with federated computation
plus some additional ones that are learning specific. For example, the data
sets in a data center are usually balanced in size. Most compute nodes will
have a roughly equal size slice of the data. In the federated setting, each
device has one users’ data, and some users might use
Gboard much more than others, and therefore those data set
sizes might be very different. Similarly, the data in
federated computation is very self-correlated. It’s not a representative
sample of all users’ typing. Each device has only
one user’s data in it. And many distributed training
algorithms in the data center make an assumption
that every compute node gets a representative
sample of the full data set. And, third, that variable
data availability that I mentioned earlier– because the people whose
phones are plugged in at night versus plugged in during the
day might actually be different, for example, night shift workers
versus day shift workers, we might actually
have different kinds of data available at
different times of day, which is a potential
source of bias when we’re training federated models
and an active area of research. What’s exciting is the fact
that federated model averaging actually works well for a
variety of state-of-the-art models despite
these differences. That’s an empirical result.
When we started this line of research, we didn’t know
if that would be true or if it would apply widely to the kinds
of state-of-the-art models that teams like Gboard are
interested in pursuing. The fact that it does work
well in practice is great news. So when does federated
learning apply? When is it most applicable? It’s when the
on-device data is more relevant than the server-side
proxy data or its privacy sensitive or large in ways that
would make it not make sense to upload. And, importantly,
it works best when the labels for your
machine-learned algorithm can be inferred naturally
from user interaction. So what does that naturally
inferred label look like? Let’s take a look at some
examples from Gboard. Language modeling is one of
the most essential models that powers a bunch
of Gboard experiences. The key idea in
language modeling is to predict the next word
based on typed text so far. And this, of course, powers
the prediction strip, but it also powers other aspects
of the typing experience. Gboard uses the
language model also to help understand as you’re tap
typing or gesture typing which words are more likely. The model input in this case
is the type in sequence so far, and the output is whatever
word the user had typed next. That’s what we mean
by self-labeling. If you take a
sequence of text, you can use every prefix of that
text to predict the next word. And so that gives a series
of training examples as result of people’s natural
use of the keyboard itself. The Gboard team ran
dozens of experiments in order to replace their
prediction strip language model with a new one based on
a more modern recurrent neural network architecture, described
in the paper linked below. On the left, we see a
server-trained recurrent neural network compared to
the old Gboard model, and on the right, a
federated model compared to that same baseline. Now, these two model
architectures are identical. The only difference is that one
is trained in the data center using the best available
server-side proxy data and the other was trained
with federated learning. Note that the newer architecture
is better in both cases, but the federated
model actually does even better than
the server model, and that’s because the
decentralized data better represents what
people actually type. On the x-axis here for
the federated model, we see the training
round, which is how many rounds of
computation did it take to hit a given
accuracy on the y-axis? And the model tends to converge
after about 1,000 rounds, which is something like a
week on wall clock time. That’s longer than
in the data center, where the x-axis
measures the step of SGD, where we get to a similar
quality in about a day or two. But that week long
time frame is still practical for machine
learning engineers to do their job because they can
start many models in parallel and work productively
in this setting, even though it takes
a little bit longer. What’s the impact of that
relatively small difference? It’s actually pretty big. The next word
prediction accuracy improves by 25% relative. And it actually
makes the prediction strip itself more useful. Users click it about 10% more. Another example that the Gboard
team has been working with is emoji prediction. Software keyboards have
a nice emoji interface that you can find,
but many users don’t know to look there
or find it inconvenient. And so Gboard has
introduced the ability to predict emoji right in line
on the prediction strip, just like next words. And the federated
model was able to learn that the fire emoji is
an appropriate completion for this party is lit. Now, on the bottom,
you can see a histogram of just the overall
frequency of emojis that people tend to type, which
has the laugh/cry emoji much more represented. So this is how you know
that the context really matters for emoji. We wouldn’t want to make that
laugh cry emoji just the one that we suggest all the time. And this model ends up
with 7% more accurate emoji predictions. And Gboard users actually click
the prediction strip 4% more. And I think, most
importantly, there are 11% more users who’ve
discovered the joy of including emoji in their texts, and
untold numbers of users who are receiving those
wonderfully emojiful texts. So far, we’ve focused on
the text entry aspects, but there are other components
to where federated learning can apply, such as action
prediction in the UI itself. Gboard isn’t really
just used for typing. A key feature is
enabling communication. So much of what people
type is in messaging apps, and those apps can
become more lively when you share the perfect GIF. So just helping people
discover great GIFs to search for and share from
the keyboard at the right times without getting
in the way is one of Gboard’s differentiating
product features. This model was
trained to predict from the context so far, a
query suggestion for a GIF or a sticker, a search or emoji,
and whether that suggestion is actually worth showing
to the user at this time. An earlier iteration
of this model is described at the
paper linked below. This model actually
resulted in a 47% reduction in unhelpful suggestions,
while simultaneously increasing the overall rate of emoji,
GIF and sticker shares by being able to better
indicate when a GIF search would be appropriate, and that’s what
you can see in that animation. As someone types “good
night,” that little “g” turns into a little
GIF icon, which indicates that a good
GIF is ready to share. One final example that I’d
like to give from Gboard is the problem of
discovering new words. So what words are people typing
that Gboard doesn’t know? It can be really
hard to type a word that the keyboard doesn’t
know because it will often auto-correct to something
that it does know. And Gboard engineers can use
the top typed unknown words to improve the
typing experience. They might add new common
words to the dictionary in the next model release
after manual review or they might find out
what kinds of typos are common, suggesting
possible fixes to other aspects of the typing experience. Here is a sample of words
that people tend to type that Gboard doesn’t know. How did we get
this list of words if we’re not sharing
the raw data? We actually trained
a recurrent network to predict the sequence
of characters people type when they’re typing words
that the keyboard doesn’t know. And that model, just like the
next word prediction model, is able to be used to sample
out letter by letter words. We then take that model in the
data center, and we ask it. We just generate from it. We generate millions
and millions of samples from that model that are
representative of words that people are typing
out in the wild. And if we break these
down a little bit, there is a mix of things. There’s abbreviations,
like “really” and “sorry” missing their vowels. There’s extra letters added
to “hahah” and “ewwww,” often for emphasis. There are typos that
are common enough that they show up even though
Gboard likes to auto-correct away from those. There are new names. And we also see examples of
non-English words being typed in an English language keyboard,
which is what this was– English in the US was what
this was trained against. Those non-English words
actually indicate another way that Gboard might improve. Gboard has, of
course, an experience for typing in
multiple languages. And perhaps there’s ways that
that multilingual experience or switching language more
easily could be improved. This also brings us to our
fourth privacy principle, which is don’t memorize
individuals’ data. We’re careful in this case
to use only models aggregated over lots of users and trained
only on out of vocabulary words that have a particular
flavor, such as not having a sequence of digits. We definitely don’t
want the model we’ve trained in federated
learning to be able to memorize someone’s credit card number. And we’re looking
further at techniques that can provide other kinds of
even stronger and more provable privacy properties. One of those is
differential privacy. This is the statistical science
of learning common patterns in the data set without
memorizing individual examples. This is a field that’s been
around for a number of years and it is very complementary
to federated learning. The main idea is
that when you’re training a model with federated
learning or in the data center, you’re going to use
appropriately calibrated noise that can obscure an
individual’s impact on the model that you’ve learned. This is something that
you can experiment with a little bit today in the
TensorFlow privacy project, which I’ve linked here, for
more traditional data center settings, where you might
have all the data available and you’d like to be able to
use an optimizer that adds the right kind of noise
to be able to guarantee this property, that individual
examples aren’t memorized. The combination of differential
privacy and federated learning is still very fresh. Google is working to
bring this to production, and so I’m giving
you kind of a preview of some of these early results. Let me give you a flavor of
how this works with privacy technology number five– differentially private
model averaging, which is described in the
ICLR paper linked here. The main idea is that in every
round of federated learning, just like what Emily
described for a normal round, an initial model will
be sent to the device, and that model will be
trained on that device’s data. But here’s where the
first difference comes in. Rather than sending
that model update back to the server for aggregation,
the device first clips the update, which is
to say it makes sure that the model update is
limited to a maximum size. And by maximum size, we actually
mean in a technical sense the L2 ball of in
parameter space. Then the server will add noise
when combining the device updates for that round. How much noise? It’s noise that’s roughly on
the same order of magnitude as the maximum size that any
one user is going to send. With those two properties
combined and properly tuned, it means that any particular
aspect of the updated model from that round might
be because some user’s contribution suggested
that the model go that direction or it might be
because of the random noise. That gives kind of an
intuitive notion of plausible deniability about whether
or not any change was due to a user versus the
noise, but it actually provides even a more
stronger formal property that the model that you learn
with differentially private model averaging will be
approximately the same model whether or not any
one user was actually participating in training. And a consequence of
that is that if there is something only
one user has typed, this model can’t learn it. We’ve created a
production system for federated computation
here at Google, which is what has been used by
the Gboard team in the examples that I’ve talked about today. You can learn more
about this in the paper we published at SysML this year,
“Towards Federated Learning at Scale– System Design.” Now, this system is still
being used internally. It’s not yet a system that
we expect external developers to be able to use,
but that’s something that we’re certainly very
interested in supporting. EMILY GLANZ: Awesome. We’re excited to share our
community projects that allows all to develop
the building blocks of federated computations. And this is
TensorFlow Federated. TFF offers two APIs, the
Federated Learning or FL API, and the Federated
Core, or FC API. The FL API comes
with implementations of federated training
and evaluation that can be applied to
your existing Keras models so you can experiment
with federated learning in simulation. The FC API allows you to
build your own federated computations. And TFF also comes with a
local runtime for simulations. So, earlier, we showed you
how federated computation works conceptually. Here’s what this
looks like in TFF. So we’re going to refer
to these sensor readings collectively as a
federated value. And each federated value has
a type, both the placement– so this is at clients– and the actual type of the data
items themselves, or a float32. The server also has
a federated type. And, this time, we’ve
dropped the curly braces to indicate that this is
one value and not many, which gets us into our
next concept is distributed aggregation protocol that
runs between the clients and this server. So in this case, it’s
the TFF federated mean. So this is a federated
operator that you can think of as a function,
even though its inputs and its outputs live
in different places. A federated op represents
an abstract specification of a distributed
communication protocol. So TFF provides a library
of these federated operators that represent the
common building types of federated protocols. So now I’m going to run through
a brief code example using TFF. I’m not going to
go too in-depth, so it might look a
little confusing. But at the end,
I’m going to put up a link to a site that
provides more tutorials, and more walkthrough
is of the code. So this section of code that
I have highlighted right now declares our federated type
that represents our input. So you can see we’re
defining both the placement, so this is at the TFF
clients, and that each data item is a tf.float32. Next, we’re passing
this as an argument to this special function
decorator that declares this a federated computation. And here we’re invoking
our federated operator. In this case, it’s that
tff.federated_mean on those sensor readings. So now let’s jump
back to that example where the model engineer had
that specific question of what fraction of sensors saw
readings that were greater than that certain threshold. So this is what that
looks like in TFF. Our first federated
operator in this case is the tff.federated_broadcast
that’s responsible for broadcasting that
threshold to the devices. Our next federated operator is
the tff.federated_map that you can think of as the
map step in MapReduce. That gets those 1s
and 0s representing whether their local values are
greater than that threshold. And, finally, we perform a
federated aggregation so that tff.federated_mean, to get
the result back at the server. So let’s look at
this, again, in code. We’re, again,
declaring our inputs. Let’s pretend we’ve already
declared our readings type and now we’re also defining
our threshold type. This time, it has a
placement at the server, and we’re indicating that there
is only one value with that all_equal=True, and
it’s a tf.float32. So we’re again passing that
into that function decorator to declare this a
federated computation. We’re invoking all those
federated operators in the appropriate order. So we have that
tff.federated_broadcast that’s working on the threshold. We’re performing
our mapping step that’s taking a computation
I’ll talk about in a second and applying it to the
readings in that threshold that we just broadcast. And this chunk of
code represents the local computation each
device will be performing, where they’re comparing their
own data item to the threshold that they received. So I know that was a
fast brief introduction to coding with TFF. Please visit this site,
tensorflow.org/federated, to get more hands-on
with the code. And if you like links,
we have one more link to look at all the ideas
we’ve introduced today about federated learning. Please check out our comic book
at federated.withgoogle.com. We were fortunate enough to work
with two incredibly talented comic book artists to illustrate
these comics as graphic art. And it even has corgis. That’s pretty cool. DANIEL RAMAGE: All right,
so in today’s talk, we covered decentralized data,
federated computation, how we can use federated computation
building blocks to do learning, and gave you a quick
introduction to the TensorFlow Federated project, which you
can use to experiment with how federated learning might work on
data sets that you have already in the server in
simulation today. We expect that you
might have seen, the TF Lite team
has also announced that training is a big
part of their roadmap, and that’s something
that we are also really excited about
for being able to enable external developers to
run the kinds of things that we’re running
internally sometime soon. We also introduced privacy
technologies, on-device data sets, federated aggregation,
secure aggregation, federated model averaging,
and the differentially private version of that, which embodies
some privacy principles of only an aggregate, ephemeral
reports, focused collection, and not memorizing
individuals’ data. So we hope we’ve given you a
flavor of the kinds of things that federated learning
and computation can do. To learn more, check
out the comic book and play a little bit
with TensorFlow Federated for a preview of how you
can write your own kinds of federated computations. Thank you very much. [APPLAUSE] [MUSIC PLAYING]

Leave a Reply