Deep Learning in Medical Imaging – Ben Glocker #reworkDL

Thank you, so I should say I’m talking about
some successes and challenges coming from our group bimedical image
analysis group at Imperial College London. These successes I’m talking about
are mostly biased towards what we have done so far of course there’s a lot of success
these days in medical imaging using deep learning and I want to use a few minutes
to reflect a little bit why we are now looking into deep learning in medical
image analysis. There’s really this rise of deep learning I guess that’s the
reason why we are all here so I did this query on PubMed, the publication
database for medical and clinical publications. If you query for machine
learning in the title and abstract then you have this nice trend quite
predictable with the trend of using more and more data driven approaches. However
if you do something similar with deep learning it’s less of this trend, it’s
more a disruption so you basically see there wasn’t much
at all and then very suddenly there was this huge jump in the number of
publications. So a few years ago there wasn’t really much and I get this is the
third deep learning summit in healthcare so you can see that they were right at
the beginning of trend when they started these summits. Before deep
learning actually what we were mostly using are more classical machine
learning techniques, so I stole here the scikit learn algorithm cheat sheet.
Scikit learn is a Python library very successful widely used open source
machine learning library and has lots of classical machine learning techniques,
very nicely developed and basically they gave you this guide so whenever
you has a data science problem you could follow this guide you could find out
which technique to use whether it’s a classification or regression or
dimensionality reduction technique and it seems if you look at the literature
now you can do all of this basically with deep learning right. So then in
particular and medical image analysis a particular technique that we were using
quite a lot and very successfully before deep learning was based on decision trees.
So most clinicians are used to use decision trees and we found ways of
using randomized decision trees which were very successfully applied to
machine learning and to medical image analysis problems but also computer
vision problems and the reason why they are nice to use is because they are
quite interpretable. So you have these decisions to make that tell you whether a
feature is useful or not and it can separate your data and then in the leaf
nodes you can make decisions and etc you just follow a path in the decision tree
and it was very useful and scaling this up was also quite possible, by trying to
start building deeper and deeper trees so you would, were using lots of features
from your data and that in particular medical imaging we saw was quite quite
useful. So there was a time before deep learning where everyone started to use
random forests for kind of things because we saw when we put together lots
of decision trees we get very strong classifiers. But then came deep learning and really it has changed like fueled a lot, so these
days people wouldn’t consider random forest as their first choice, they might
just go with the standard convolutional neural network as a baseline to
get an idea how difficult the problem is and the reason is why or where the power
of deep learning is really if you compare decision trees for instance to
neural networks then we already know for quite some time that you can represent
the decision tree with a neural network which simply has two hidden layers. So
all the decision nodes of a tree on the first hidden layer and all your leaf
nodes on second hidden layer and that’s how you can perfectly represent the
decision tree with a two-layer neural network. Now if you build deeper trees
where you get the more discriminative predictive power from in-decision trees,
the interesting thing is to notice that you don’t get deeper networks, you simply
get wider networks so a deeper tree mapped again to neural network, gives you a
wider network but not a deeper one and the deep is where we get the power I
guess in deep learning techniques. So if you look at what models are used these
days you have lots of hidden layers stacked on top of each other and this
stacking of hidden layers allows you to learn these very complex data
associations between your input data and your output data and that’s where the
power is coming from. So just to give you some successes from our own lab where we
have applied in the last couple of years deep learning techniques, I want to show
you some results from brain lesions, fetal imaging and also cardiac image
analysis. So the first one is on brain lesions, so this is work that we have
already started to present last year where we started with this so one of my
PhD students, Kostas, over here who is in the audience, if you want to raise your hand yes
because he can answer your detailed questions afterwards. So he came up with
this architecture, so we use the relatively standard convolutional neural
network, made it possible to process large 3D imaging, so this is done on 3D
volumetric brain scans and we use multi- resolution approach because it helped us
to incorporate local but also spatial contextual information and that worked
pretty well, so this network allowed us to come up with a system that is able
to analyze the 3D scan and gives us a 3D, a very accurate map of where the network
believes there are lesions so this was done on brain lesions, people that have
accidents and and thus maybe have contusions in the brain but it equally
works well for things like brain tumors or ischemic stroke lesions and here’s
just some visual results of course there’s proper evaluation of this in
all our publications and using this technique we found that we can actually
apply it to quite other applications but which are similar in nature. So
the next application we did last year, so Amir Alansary another PhD student is our group,
used this network for fetal imaging. So we came up with an architecture so
this is where the methodology part of a paper goes in where you describe how the
architecture looks like, you might be certain adaptations to your previous
model so you came up with a few steps that you needed to process that data in
particular and just showing you some results, so this is again MRI imaging of
fetus’ at a very young age. So this is quite new imaging technique that
is developed together with King’s College London and what we are
interested in here really is to analyze these things and to be able to extract
for instance the organs of the fetus because this would allow us to make use
of it for instance for diagnostic purpose. We want to see whether the
baby grows at a normal rate so we want to analyze the brain, the spinal
cord, the inner organs but we also found that with this technique we can do
a very challenging task of placenta segmentation so placenta, the
positioning and the shape of the placenta is quite crucial and it allows
you to predict whether a pregnancy would follow a normal trajectory. So with
this technique we are now able to analyze these 3D scans and actually make
use of them in clinical practice. So the next success I would say I want to
discuss is cardiac analysis and this is work done by Ozan Oktay also here
somewhere in the audience, she can raise her hand as well, so he can answer also
some questions. So the problem with cardiac is, if you want to image the
heart it’s very difficult because it’s always moving so normally what you do
when you again use MRI for instance, you would do slide size acquisition and if
you put a slide at a certain position of the heart
you get a nice 2D video but then you want in, at the end what we want is a 3D
volumetric video basically a 4d image of the heart.
So if you take another slice you get another video and you get another slice
you get another video the problem is you now need to stack them together and
because this has to be done relatively quickly because the heart is moving all
the time. If you just stack them together as they are acquired, you have to use these
thick slices so instead of having a nice 3D volume you have a stack of 2D slices
which are actually quite low resolution if you look at them from the non
scanning direction. So other than what’s coming up with this network architecture
that would allow us to up send those images. It’s a super resolution network,
it allows you to basically interpolate towards a higher resolution 3D scan. So
yes, just some visual results you see on the left, the original data set and then
you see a standard up sample, like using linear interpolation, on the right next
to it and then you see the results coming out from a deep learning neural
network and comparing this through a drawn
through 3S scan you actually see that you get quite close, you even preserve
nicely some of the smaller details. Now you might wonder why if I can actually
acquire a full 3D scan why would I want to absently lower resolution to 2D-scan,
the one on the right slide is not something you would do in clinical
practice because it takes much much longer time, it’s ECG triggered and it’s
not something that is used in clincal routine. Now the images itself you
probably you wouldn’t use for diagnosis because it kind of hallucinates some information you up sample an image and you might introduce something
that wasn’t there. So you wouldn’t use it for diagnosis but what you can use it
for is further analysis for instance we found that you can use it for doing
image segmentation, so you can extract the 3D shape of the cardiac and it’s
nicer than doing it on the interpolated image and it’s very close to the one
that you would get on the full 3D high resolution scanm so this you see on
the most right is you can see the little discrepancy in red is coming from the
linear interpolated image and the green one is from our up sampled 3d
image. So this is all nice and we make good success here in term of we have
new applications applying deep learning on exciting clinical applications we can
write papers about it but while we are doing this we also realize that they are quite
some challenges and I guess today there was already a lot of talk
about different types of challenges so I want to discuss three channels that we
saw in particular with respect to medical imaging. So the first one is
about learning the right features the second is detecting when it goes wrong
and the third one is how can we go beyond human level performance so
learning the right features might sound weird in a deep learning context because
you would hope that your artificial neural network would just do the right
thing right so you give it the data you don’t have to worry about handcrafting
designing features, you just let it discover the right features from the
data. The problem is with the simple example I want to show you that that
might not always work. So very often in particular medical imaging you might get
a training dataset from a site, one hospital and that might look like this
so you have examples maybe even labelled what is the malignant, that is, what is a
benign cell or structure and you want to learn features how to discriminate them. If you get that training data set to a neural network, what it will learn is
that malignant objects have this orangeness in there, in the appearance and
benign things are probably green. However usually what’s happening then when you
get new data from another site, you might have data like this where you suddenly
see that your learned network or your train network is completely failing on
your test data simply because it learned the wrong features and there’s nothing
that would stop the neural network to go for color because of the most obvious
discriminative feature in the first place, in your training data. Now you
might say that’s the problem with your training data and yes that’s right that
is the problem with the training data but the medical imaging, it is simply not
possible to always have a data that that is representative of every possible
source of data. It might be a new scanner coming up that didn’t exist when you
have built your product. So Kostas again worked on extending his conclusion
in neural networks by using something that we just discussed a little bit
which is adversarial nets, so we use this to do what we call unsupervised domain adaptation or what is called unsupervised domain adaptation and the idea here
really is that you learn your standard neural network, so let’s say you want to
do brain tumour segmentation, you learn a network, a image segmentation network, a
convolutional neural network that does this now at the same time what you’re trying
to do is you have a another network that looks at the learned representations and
what you try to do is you try to figure out where the data is coming from.
Now if you train such a network on data on labelled data that’s coming from your
Site A and you have another data set that doesn’t have to be labeled which
comes from site B. That first Network the image segmentation at work will
try to learn features which are invariant to where the data is coming
from, right, and this allows you to mitigate exactly that problem, it
wouldn’t learn for the previous example the color is the discriminative feature,
it would learn that it’s actually the shape that would discriminate the two
structures because that’s what makes them invariant to where the data is
coming from. So this worked really well so we tested this again on brain lesions
where we had MRI brain data coming from different sites, so basically two
different MRI scanners and in one of them they acquire
a certain set of MRI sequences and in the other one they almost acquire the
same but there was a slight difference because have they changed to new sequence in
the MR acquisition which actually made our our previous network that was trained on
data from the one scanner completely fail on the second
scanner. However after using the domain adaptation, using the adversatial nets
our network went up to the performance that it got on the first site even on the other data from the new scanner without using any
annotations just training the network to discriminate or to not
discriminate where the data is coming from helped us to get back to the
performance and learn the right features So the second challenge is about
detecting when it goes wrong so very often when you build these neural
networks the problem is you have this development cycle if you build a neural
network you do validation testing, you look at your learning curves, you do
modifications on your network and you live in this cycle, so at least in
academia, we often live in these cycles and then every end iteration, we try to
publish a paper. Now the problem is at some point you want to use those systems
in clinical practice, so if you go to the deployment and you use now a trained
or network on data that is acquired in the hospital, what you might find is that
it doesn’t work so well. The problem is how do you know, how do you know that it
doesn’t work well? You could look at the images and you could try to do some sort
of manual quality control, you could annotate data but that’s not feasible
at scale, you can’t always look at if you have a large scale data analysis, you
can’t look at every single case and check whether your network worked
whether or not. So what we are trying to do here is basically we try to figure
out to automatically recognize when something goes wrong but not on average
but on a per subject basis. So we came up with this framework reverse
classification accuracy which is of course based on previous work on reverse
testing and reverse validation but really the idea here is to use this
within the medical imaging set up where we say there is normally some training
data some reference data for which we know what the what the true labels are
at least the ones that we believe are the best possible we can get. Those are
usually used to train a network so you learn a system, you learn a machine
learning methods, a convolutional neural network something that you can then
deploy in practice. Now on your new data you get a prediction and the question is
how would the fits prediction every time you want to know did it fail or the
the reason the job. You can use the prediction and train what we call a
reverse classifier so that is a classifier that is completely over
fitted to that one example but you can use it to make predictions again because
it was trained on so-called SOI door and truth so something on on a prediction
now if you go back to your training data you can actually check how well does the
reverse classifier work on your training data and of course it might not work
well at all and in particular it might not work on all of them but it might
actually work on some of them and it might work on the ones that are very
close to your testing data, and now on this one you can actually of course
quantify, you can get a number a quality score, a metric that tells you how well coudl your segmented training data by using the prediction in your test data
and we showeed that this is actually well correlated with the true
performance that you would get on your test data so we tested this on multi
organ segmentation whole body MRI but in the meantime also on cardiac imaging, but
just to show you a plot. This is on the x-axis the real dice score so it’s one
segmentation quality metric that measures overlap between a predicted
segmentation and a ground truth manual segmentation and you see that the
predicted score that we get automatically correlates very well with
the real score so you could use this ones to automatically detect by setting
a threshold functions on the predicted score if a system has failed. So now the
last challenge I want to discuss is how can we go beyond human level performance,
now we already feel or there’s a lot of talk about reaching human level
performance for many tasks but the question is how can we go beyond and
what are the limiting factors, in particular in medical imaging now
obviously it’s somewhere in the data that is generated where we say we have
ground truth, we never have ground truth we have something that an expert provided
us. So we try to find the best expert we can but still of course there will be a
lot of variation in the quality of annotation so they have to sit down and
have this very tedious and time-consuming job of very detailed connotations on
medical imaging and of course if we train methods that aim to replicate this, it’s very difficult to go beyond right so you can maybe reach the
performance of the best expert but ideally at some point we want to use
this to be better than humans. So one idea of this or I think one of the direction,
promising direction would be is to really think about synthesizing ground truth, we don’t want to rely on expert and their time and doing the annotations
we want to generate data for which we know the ground truth and this could be
done by synthesis. I think we have heard about GANS which are one way of synthesizing
data and we fit maybe all some recent works on that so the idea would be
really what if we could draw a manual segmentation label map and have a
network that would produce us the corresponding image because then we
really have the matching input and also output data and it could be considered ground truth. So
this is work in progress, I don’t know how many people have seen this last week
or the week before coming up on the web where people came up with a tool where
you can draw outlines of cats and it will produce you an image of a pet so
this is called edges2cats actually I thought that’s exactly what I would like
to have for my medical imaging, so I want to draw something on the left and I
want to get a cat on the right when in my case I want to have an MRI scan. So I
looked at few of those example they’re actually quite entertaining and you see
some nice results here, some not so nice results some scary results you might get
a cat with a few more eyes here and obviously I was tempted to try this and
wanted to get my medical image out of this but it didn’t really work so I
guess there’s still a lot of work for us to do but I hope with these challenges I
gave you some idea where we are currently with our research and of
course thanks to all our funders.

Leave a Reply