Machine Learning in Health Care



each year Microsoft Research hosts hundreds of influential speakers from around the world including leading scientists renowned experts in technology book authors and leading academics and makes videos of these lectures freely available you I got started with my introduction where the last couple of people are coming in my name is Katya Hoffman I'm a researcher here at the machine learning and perception group and today I have the Honorable honorable task of introducing to You Antonia criminy see who is the colleague in the same group Antonio is a principal researcher he's been with Microsoft research for a long long time um and today he will give us an overview of some of his latest research when I asked Antonio why he's focusing on healthcare he said that because machine learning for healthcare is the future there are some very interesting developments there where we have experts on the one side medical experts who are very very good at interpreting the data that they see and the information that they get from models on the other side we have machine learning algorithms that don't really have that expertise that can very very effectively extract patterns but don't really know what to do with the patterns that they have extracted so Antonio will give us new insights and how to make this combination work and generate very interesting insights from the combination of human expertise and learned machine learned models thank you very much Katherine thank you for being here come on in come on in problem is that lunch was too good it's fantasy still coming in that's good all right thank you for being here so you're stuck with me for about 50 minutes feel free to interrupt and ask questions it's a very informal talk very informal setting so hopefully we will learn something from this so I'm in the machine learning and perception group working here in this building and indeed in the past five years or so I've been applying some of you know what we know in the machine learning and computer vision space applying it to the automatic analysis of medical images and with medical images I mean you know any medical image so it could be x-rays computer tomography magnetic resonance or even images of the patient being captured from the outside as well so I'm assuming right now that we know very little about machine learning and so I will have a sort of slow introduction you know very pictorial not much Matt to make sure that we're all on the same page and you know we understand the basic principles around machine learning in particular machine learning applied to the analysis of your images and I start with a very you know simple toy example imagine that you know we want to separate images of sharks from images of something else it could be lizards and so here we have collected some examples of you know sharks and lizards and immediately you know as you look at those images you say oh this task is really easy to do because you know I can just measure how blue the overall image is and be able to separate one from another and that's absolutely true for this you know to example I can represent each image in a very simple very low dimensional feature space like this you know I can compute how much red I have in the image and how much blue are having the image and this you know each of these squares represents one of these you know images this is actually computed you know from these very images and so you can see the you know separating these images you know becomes a very trivial task of course this is you know a very simple to example just for you know explanation but computer vision is not so easy and you know figuring out the content of an image that it actually contains a lizard or a shark is never so simple especially when we move on to a large number of classes and more complex you know images as well and so typically what happens is they you know you want to represent images in a much higher dimensional space and as an example this is you know again a three dimensional space and you know we measure three features rather than one so when the task becomes really really complicated it is impossible to design features by hand and you know hope to be able to capture an entire image or video even you know in a low parametric you know space like this one so we have to resort to machine learning and so what does that mean it means that you know we measure some measurements out of you know the data that we have in this case images but we also have to have some you know labels so this could be you know my what is called the training set and the labels here are represented by these you know colored circles where blue indicates you know class one and this you know tan color represents you know class two and the whole purpose of machine learning will have to say a very specific class of machine learning which is in this case supervised machine learning is to learn a genetic you know function or is it here we go this green curve that separates in some you know optimal sense you know the two classes now of course you know we want the separating function to work you know well not just for the training examples the you know are squares and labels but also for any new image that I haven't yet seen and so that's the problem of your generalization so this is pretty much what was done for Kinect so when you go and buy Xbox and by Kinect and you start playing Kinect games machine learning is what drives everything in what way right so the Kinect depth sensor produces these sorts of images where each pixel is associated with a gray value where the gray value indicates how far that 3d point is from the camera and so you see different shades of grade from dark to to bright where dark means close to the camera and bright means far away from the camera and what we've done is you know we collected a very large repository of depth images where each of those young images for each pixel in this case we had associated a label they told us whether you know that pixel belonged to the right arm or the left arm you know the right hand or the left hand the right side of the head and the left side of the head and so on we had 31 different classes for the you know different body parts and so what we did is you know we trained a classifier that would would associate a pixel in the depth image and its context as well it's very important to look not just on the individual pixel but you know what surrounds the pixel as well and be able to predict a per pixel class earlier we were talking about a prediction per image in this case is per pixel so that's a big difference it's a lot more computation and so all of this was done with some classification algorithm there was extremely efficient now when I talk about classification again that is a subset of all you know supervised machine learning there are many supervised machine learning algorithms classification is one subset so how exactly was that done and what type of your classifier did we use so this you know very woolly you know iconic representation Airy classifier now becomes a a decision tree so probably most of you know what a decision tree is is a data structure which comprises a root node and then some split nodes or internal nodes and then some leaf nodes also called terminal nodes what does that have to do with you know classification we have probably learned that this can be used to represent very officially you know data which is a very hierarchical nature how can I use this for classification well imagine again that you know what I have is a picture this is a photograph of two of my many kids and we want to learn to figure out whether you know this picture represents a scene that was taking indoors or outdoors so you can say oh I know how to do this is a really you know simple task I'm very very proficient at in your MATLAB or C++ I can write this in in no time and so I start writing down a code and you know I think with my human brain and I say oh the first thing I can do is I can check whether the top half of the image is blue if it is blue that's likely to be sky therefore it is likely to be outdoors and so effectively what I do is I can start constructing a list of you know nested if then statements right so the first statement is is the top part of this picture blue or not if it is then I travel here and then I say oh wait a minute I should perhaps also check if the bottom half of the picture is blue because if it is all if it is also blue then perhaps I'm just stating at the blue wall you know so maybe it is indoors and so you start writing this you know if statements one after the other one after the other every time you taste your algorithm they're always cases where it doesn't work and then you think oh okay there is an exception in a very another exception and effectively you end up with a very very big tree of if-then statements and that's your program right and unfortunately it never works sorry to break the news but computer vision is a lot harder than that and every time you think of a rule there is always an exception it's a little bit like Latin for the ones of you who study same thing so how do you fix this problem you fix it through machine learning so what we want to do is you want to collect a lot of images which have been labeled by humans with what we call ground truth labels as being indoors or outdoors and we stick with a decision tree type of architecture structure but we want to be able to learn you know the questions to ask and where to ask them and also the structure of the tree itself how deep if it is balanced or unbalanced you know where to ask questions what of equations all of that has to be learnt automatically so you can see the machine learning in some instances is not very different from you know writing stuff yourself in a program except that you know the parameters of this program can be all learnt automatically and this is particularly true for one side of machine learning which is the discriminative editable models oh that's the wrong keyboard I can use this okay so let me walk through an example of again another toy example will have lot of toys toy examples to try to make yourself you know feel familiar with at least this type of your machine learning techniques so imagine that we have a two-dimensional representation of your data so we have x1 and x2 or my two cords my features if you want to call them this way and we have four classes color coded you know green red blue and purple apologies if blue and purple eyes are not very easy to distinguish so what we can do is we can try to learn a decision tree that separates those classes in it in a way that can be used and produces generalization so what we do is you know we start by sampling completely randomly a hyperplane a line in the inner 2d case and ask the question okay does this line separate you know the classes well or does it not well it goes through all the blue points it doesn't do a particularly good job so we can randomly sample a different line and we can does this separate your may data well or not and we can keep you know doing this for you know until we get bored pretty much or depending on how much computation power we have eventually we find a line which in this case separates you know two classes from two other classes in in a decent way without cutting through any of those clusters and so we say okay I like this line I'm going to keep it and so you associate this line in a high dimensional space it's a hyperplane with the root node so that is the first test and we say this works well I'm reasonably happy with it I you know fix it I freeze it I store it into the root node and now I proceed so what happens now is I send all this data to one child if we assume this is a binary tree we have only two children and then all this other data to the other child your left and right children and so I proceed and so now I test some new hyperplanes and you see the decipher planes only half planes they stop you know at this boundary because we know the boundary already I keep you know testing all these different hyper planes and I say ok I found another one that works well now I look at the other one recursively and eventually I like this one and I stop and they don't need to be absolutely perfect they don't need to be absolutely pure and even if there is a little bit of impurity left there you know we can deal with this because you know we will see soon that you know we can use multiple trees rather than one single tree and every time we taste the new hyperplanes you know when I said we can check whether it works well or not you know this is done computationally by measuring a an entity which is called information gain which is a function of the purity of each node which can be computed as you know the entropy so probably you know most of you are familiar with this it's just a way and not the only way to measuring whether the training data arrives you know inside a bucket is relatively pure or not so this is not new I mean a lot of people have been doing this sort of work and decision trees are you know very very old techniques the difference here is that you know in recent years you know predating connect people have been applying decision trees you know at scale with lots and lots and lots of data so we moved from you know a world where these decision trees were designed by hand for instance in the medical space you know decision trees are still now used to say okay my patient has got headache all right let's try this test if this is positive then let's try this other test and if it is negative we will try different tests and so it's you know we you have this concept of you know hierarchy and therefore conditional computation you run certain tests only if the previous chain is positive and not if it is negative but now you know these have been applied you know a scale on thousands and hundreds of thousands of hundreds of thousands if not millions of images like in the case of Connect and also we have been able thanks to advances in computer power we've been able to test different variants of decision trees for instance one important element of you know the decision tree is the split function itself so earlier I gave you an example we were using you know this sort of split function so oriented in your hyperplanes but actually in connect in order to get you know the most out of the small computational power you have in your xbox we use something which is even cruder and simpler which is axis aligned you know hyper planes which really corresponds to taking one feature at a time and comparing it to a set threshold so this reduces the con the complexity of the model the number of parameters that you have to search over but it will have some drawbacks and we will see in a minute what those drawbacks are on the other end of the spectrum we can think of associate in each node with a more and more complex flip function that could be you know a curved you know surface in this you know very high dimensional space so these are just some of the many things that you might want to change in your decision tree model eventually when you know you train a model of this type this is what you end up with you end up with them with a tree that looks a little bit like this so you have a root node and then your split split split lots of split nodes the tree can become very large very quickly which is a problem in this case we have you know what is it sixteen or twenty levels something like that and the tree is not necessarily balanced so there will be some branches which grow a lot further than than others you know just because your information gained tells you that you know the nodes are still very unpure there is still information to be gained by splitting the data further and further once you have you know trained these a tree like this and then you freeze it okay you put it on your you know hard disk and now at runtime you test it so the test time you know what happens is the a new image that has never been seen before is pushed through the tree and like I said before you know the first test is applied you read off the taste of all store at the route it is applied and then this input data gets you know sent to the right child in this case and then here a new taste is applied it gets sent to the left child and so on and it navigates its way through this hierarchical structure until you reach a leaf and I don't leave for happens at the leaf during training you would have pre stored you know the statistics that you have observed you know the empirical distributions that you have gathered from the training data itself so in this case it says that the training data there arrives at delete this leaf is mostly green of the green class and then there is a little bit of your red clients and a lot fewer of the blue class and so you know during this time you reach that leaf and you say okay this is the probability you know associated to that test input data say it's an image and then you know what happens is that you can have multiple trees so that's the key idea four so therefore this is nothing but a collection of trees now all these decision trees have been trained say in parallel you know on a parallel architecture if you want and therefore they can be trained all know very efficiently and with some level of randomness so you know while you train the trees you make sure that you know each of them individually produces some you know good level of your prediction but at the same time they are all slightly different for one another and the fact that they are slightly different from one another you know helps you in achieving higher generalization higher in you know how generalization for the data track the text data there hasn't been observed beforehand so how do you combine the outputs of these you know many trees for instance you know here is an example imagine that we have you know four trees and imagine that you know right now we're in a regression setting rather than a classification regression means that your prediction is a continuous value we call that value Y and V is my high dimensional input feature vector so each tree effectively computes the conditional probability you know of Y with respect to the input vector and imagine for ease of argument that you know each leaf we model this conditional probability as a Gaussian so it's a bail function that looks a little bit like this so for the same input data you push it through all four of the trees they will reach four different leaf nodes and each of these leaf nodes you will read out a different Gaussian distribution and you know lays imagine that they are you know these are the four different dozen distributions so they all normalized and they're color coded as you know corresponding to you know the four different trees and what you notice is that some of them are more you know peaked around the certain mean value some others are more you know widespread so this indicates the fact that some trees for that data are much more confident about the prediction than others now typically the way you you know merge together different individual tree predictions to and combine them together to get a single forest prediction is very simply by just averaging out you know these distributions and if you do that you know you end up with this black curve where what you see is that the highest confident confidence tree has got the most weight the most mass in the in the final forest prediction while the ones which are very non confident you know very uncertain will have much less influence on the final prediction which is precisely what we want but of course you know simple you know arithmetic averaging is not the only thing we can do we can do you know geometric averaging multiplication between the different distribution and this would lead to very different behaviors for the forest itself now all I'm trying to say is that you know we have done a lot of work in analyzing you know how things change how the behavior of the random forest change if you change different parts you know different sub models within you know the tree model for instance what happens if you know you increase the depth so here is another slightly more complex toy example again we have a 2d space and we have these four spiral arms representing representing four different classes and we need to learn to separate those classes now just on the training data all these models work perfectly well right they do 100% score but you know the behavior outside of the training area by you know near the corners of this square are all very very different so you see that if we use the sort of your split functions that we used in the Kinect application where you have axis aligned features you know you start to get a very weird blocky sort of your artifacts if you use you know oriented lines or in two hyperplanes you know you have a level of generalization that at least visually you know people will feel a little bit more comfortable with and then if you use conic sections you know you have probably an even better behavior especially if you look at this corner here compared to this you see the here you get more gray and the gray color indicates your high uncertainty which is precisely what you want to get as you move farther away from the training data so what I'm trying to say here is that I see you as many of you would know it's not important to just achieve 100% you know accuracy on the training data it is also important to look at the accuracy of my confidence right so how confident am I you know when I move away from the training data and what is the accuracy of my predicted confidence and that is a very tricky thing to measure it's very very difficult but at least here we have some visualizations that allow us to to make sense of that there is code available you know you don't need to buy my book but you can download the free version of it so don't worry about that and the code is also freely available for research purposes and both the the book and the code are designed with you know first-year PhD students in mind and so hopefully it's all very accessible it gives you a nice gentle introduction to machine learning with applications to computer vision and medical image analysis now let me talk about some variants you know of decision trees and decision forests well one variant which is particularly interesting to us is what we call jungles sorry for the lack of imagination so what happens with decision trees is that you know although they are very useful in practice they are still limited in particular one way in which they are limited is you know this combinatorial explosion so as the tree grows you know I use more and more levels of depth which is you know having having big trees is useful in real life applications like you know computer vision like you know semantic segmentation of images for instance as it grows you know the number of nodes you need to store you need to run computation computations for you know grows you know with the power of two so it grows very very quickly but also an even bigger problem is the fact the you know quite quickly we run out of your training data so we have a training set which you know has got a limited size it gets pushed into the root node and then you know in average it gets split into half and half half of it travels to the left sub branch another half and then you subdivided into a half half half half again so you have quarters and so on very quickly you just run out of training data so you can no grow these trees you know to a depth that you would find it really you know ideal so what do you do well you know a simple idea would be to try and merge some of the nodes together so the outputs of this node and this node get merged into a single node rather than having two separate nodes and now the question is okay that's an interesting idea because it allows us to use the same amount of training data more efficiently but how do I merge things and of course you know if you have multiple of these new structures which are not trees anymore but these are directed acyclic graphs they're called dogs in short and in samba love those is what we call a jungle so they help you know both because you know now we can constrain the memory use we can constrain the maximum number of nodes we have in in the dag going and in the jungle and hopefully because we reduce the number of overall parameters controlling this you know classifier that might have also positive effects to what is called generalization I'm not going to go too much into the details of that but happy to discuss it with you so let me explain a little bit you know again with another computer vision based you know toy example how this might work so imagine that we have you know images of you know sort of pastoral scenes by in Grantchester and we have animals of two different types you know cow and sheep and looking at patches or these images we want to figure out where those patches belong to you know the grass class or cow grass or the sheep sheep class and of course we want to be able to capture appearance variations within those classes as well like for instance you know the grass in this casing appears very bright very saturated dark green and in this case is very bright almost brownish yellowish so you know let's imagine that we represent these patches in our toy two-dimensional space where we mention the X you know the brightness and here are some form of chromaticity then these patches correspond to these points and let's imagine that we have collected many such training examples so you might be able to see some very faint ellipses you know here is one for the cow patches here is one folder sheep caches and then the grass because if it's got very wide variation in appearance we have some training examples here nothing in here and some in here so if we train a tree you know to separate these classes you know we might start with this you know source of your split function at the root and then we can proceed with the left shine here the right sign here and then we split off you know these two children again here in here and what we will end up with is with this region having again the data they arrived early in each bucket you know he's reduced very very quickly in this case we have only one you know cow patch arriving in this park bucket so this entire bucket you know gets assigned the label of cow and this one sheep and clearly you know visually at least in this toy example it doesn't make too much sense because you're this cow and sheep live in between your training examples of grass so this is not great if instead you know we use a dark representation we might start in exactly the same way as we did with the tree you know this is exactly what we saw before but now we say oh look in here so this bucket here you know contains mostly you know grass example and there is also sorry this is the wrong way one around there is also one triangle which is a sheep and this one also has got mostly grass example in dairy some you know emporiums other other stuff happening and so because of this semantic similarity statistical similarity between these two buckets I can decide to actually merge them together and so this gives rise to adapt because I'm merging nodes together because I'm merging now them together now everything in between here everything in this entire bucket gets labelled as grass you know and therefore you can do better generalization within this um sub manifold if you want to think about it this way so this is the final dug which does a more correct generalization and captures the illumination invariance and you know this is sort of know what it would look like in practice so we have got this dis merging happening here all right this is all you know very toys you know instruction or sort of examples now I would like to spend a few minutes talking about application so I'll probably go through these you know a little bit more quickly because there is a lot of detail here but I just want to you know capture the gist of how these things can be used in practice so one first application we work on in the medical space is that of you know Anatomy localization so quite often in an emergency in an A&E department of a hospital you know people can come in the rushed in because they have been victim of some sort of your accident you know a crash and so the first thing that happens is you know the doctor doesn't know exactly what's going on so they stick the patient inside a CT scanner computed tomography scanner which produces these sort of images mostly you know a big chunk of your body is imaged in this way so these are called full body images although they're know entirely full body and the first thing they need to do is they need to search through these images to very quickly figure out whether you've got broken bones whether whether something else moved around in in ways that you know are not entirely healthy and doing this you know where traditional tools is very I'm consuming and prime is you know very critical a does the stage what we propose here is a way to automatically analyze this image and just you know further just navigate through the images by saying oh I want to look at the kidneys I want to look at the spine I want to look at the heart imagine having a menu where you can just click click click the different organs in order to do so the computer needs to understand what the organs are would look like and where they are and so we put together a system that given a CT scan like this is actually in 3d you can automatically detect and zoom into the relevant in your organs and we can do this your very very quickly very effectively for you know roughly 50 or so anatomical structures so it's very efficient it takes only 2 seconds on on a standard desktop computer to do so the problem is very hard because you know we all look different not just in the outside balls in the inside you know we have this preconception that oh this is easy because you know the kidney is a kidney how you know different can you be well some of us have got very tiny ones some of us have got very big ones guess what and so at the same is for you know even with you know being completely healthy right of course and there are lots of you know anomalies like in this case there are massive you know cysts in these cases there are other see somewhere else the images can be captured with contrast agent or non contrast agent you know they might use very different scanners very different resolutions depending on how much time they have to capture the images and so on so we need to be robust with all those changes so although the first instance you might think that this is a much easier problem than classifying photographs into you know cows and sheep and penguin it is actually still a very very hard you know problem and we use machine learning to do this in particular we use a form of decision trees which are regression trees I mentioned those you know earlier where what we want to do is you know given an anchor point you know a key point which is easy to distinguish in the in the body we want to be able to predict you know the continuous variables which you know from that anchor give us the position of say the bounding box for the left kidney or the right kidney and so on so all of this is done you know using a probabilistic you know version of decision trees which are the leaves you know encodes you know discussing distributions for the displacements of the six sides of each bounding box from each reference point and the reference points themselves the anchor or landmarks whichever way you want to call them they also learned as part of this you know end to end procedure so this is what we end up with wind up with our own 3d visualization of a CT scan a full-body scan and these green areas which have been highlighted here represent the automatically learned anchor regions which the system in this case learns to use because they produce the highest confidence estimates for the position of the kidneys so when you see this spinning around again perhaps I can make it spin around again you see the you know sorry you see that this these green regions correspond to the top of the pelvis and the bottom of the lung which make a lot of sense because those are transition areas between say bony structure and muscle or air and muscle which are clearly very distinctive visually and also they're very close to the kidney and therefore you know intuitively you think oh it makes a lot of sense to use these Earth's anchor points to figure out where the kidneys but all of this was learned completely automatically so another application that you know has been quite successful in this work is something that by the way this is work that we've been doing with you know other Brooks Hospital which is here in Cambridge and it is one of the biggest research hospitals in Europe and so they've got masses of data and they're very keen to collaborate with us you know the research level so we've been very lucky to be working with with some doctors in radion colleges in particular in this case who not only the understand medicine inside out but in the free they you know play with MATLAB so it doesn't happen very often so in this case we're talking about brain images these are magnetic resonance images of the brain captured with six different modalities each of them shows you know some different aspects of the tissue in this case even you know the untrained eye can immediately see that there is something really unusual going on in the brain unfortunately that is a very nasty tumor is called glioblastoma and it is incurable and unfortunately you know the unfortunate people who get it they don't have much life left there is very little I can do one of the reasons why the reason the prognosis is so bad is because doctors don't have very good ways of your measuring whether a certain therapy is working or not so although doctors are particularly good at looking at this images and say oh this is you know definitely blastoma they cannot say you know with accuracy how big it is they cannot say whether the therapy they you know we have started you know two weeks ago is actually working or not if it is working by how much you know how much is the tumor shrinking is it 1 percent or is it 10 percent they do not have good quantitative tools and so computer vision and machine learning can help in that sense because for instance if we can segment out the tumor and not just the tumor as a whole but also you know segment out the different compartments the different tissue types now in this case in green we have highlighted in a critical those are brain cells which are just dead there is nothing you can do about them then there is you know the red area which is called the your active room the actively proliferating area so those are the most active tumor cells which draw in a lot of blood a lot of oxidant oxygen and proliferate very quickly and then there is the edema which is the surrounding area mostly composed of your healthy cells of the brain but they are swollen they draw in a lot of water but unfortunately in this area you might also have some to more infiltration already and what happens is that each of these regions ideally should be treated you know with a different therapy some of them respond better to radiotherapy some others respond better to chemotherapy but they don't notice they cannot extract these sort of Maps but now here we have a tool you know which is very promising so what we do is you know we collect a lot of you know patient data so this each column here is for a different patient you can see the variability of the of the tumor these are of course horizontal slices through the patient's brain and we also had experts label pixel by pixel you know these tumors in 3d so that's you know very expensive data to to capture but we were lucky enough to get it and then we trained our decision Forrest sorry for the cheesy animation and once we have training we store it on disk and then we can apply at run time so we have a new patient coming in with a tumor of the same type but in a different you know place in the brain with different appearance with different ratios of necrotic core versus active ring-ring versus edema you know we push it into the random forest and a classification comes out with a segmentation map so we can say you know exactly where the edema is the necrotic core and the active rim and so we've done this for you know a bunch of you know test patients that were never seen before you know by the training algorithm and these are the sort of you know correspondents sorry these are the sort of correspondences that we get so here we have the ground truth manual segmentation which is itself not perfect because it was done by by humans and you know even experts might have some differences and these are the sort of you know automatically obtained map so it's quite encouraging to see this working and it works very very efficiently because it's all based on random forests these are some visualizations that you know for the these are most useful for the non experts you know people like most of us as opposed there are not medical doctors you know to see their 3d shape of these things but they're also useful for surgery planning so I've been working with some surgeons who looking at these 3d images can better plan you know what to cut and what to leave so perhaps I have time to talk your final project then we'll have your plenty of time for questions so another big problem in in medicine is the actual data acquisition so hospitals in the Western world especially they spend a lot of money you know buying very expensive you know scanners so we have magnetic resonance scanners you know we have PET scanners we have CT scanners they're manufactured but a few small so a few big companies like Siemens you know GE Philips you know in some others and each of them costs you many many millions so you can see that you know and also the running costs are very high so each patient that gets scanned into one of those scanners you know you know has to be chosen carefully because it costs a lot of money but also especially with magnetic resonance the the time required to scan a patient is quite long so it goes from 10 minutes you know if it's not a major organ that they need to scan and not many sequences to many many hours depending on what you don't need to do in clinical practice they try to keep the time you know the for scanning within one hour but that means that a patient who is going through some you know painful time of their life will need to be you know stuck and completely stationary within a magnetic resonance which is a very you know small confined space for one hour ok elderly patients or very young patients cannot really do that so anything that we can do to acquire those images very quickly you know would be you know absolutely welcome but of course you know you can degrade the image quality you know and get the scanning you know done a lot more quickly but what if we want to maintain the resolution the high resolution of the images so we came up with an idea to super resolve images so acquire images so this is some a very special type of magnetic resonance image of a brain in this case it is called diffusion MRI which effectively for each you know small volume in your brain it measures diffusivity in off water you know think of this as you know what is the principal direction in which the water you know can move within a small section of my brain and that corresponds to fiber tracks especially you know in the white matter you have very an isotropic you know sort of structure they diffuse water only along the fibers you know the connective tissue in the white matter in the gray matter instead is more like you know mod there is no structure in there having a good idea good visualization of you know the these structures or the fibers is very very important to figure out what is connected with with with what and especially if there is something like a tumor it's very important to figure you know what that to me is impinging on and whether it is connecting whether it is impinging on areas you know to do with speech or with you know motor skills and and so on so here you you see a low resolution very quickly acquired in your scan of a patient brain and here is the same thing in much higher resolution so this color coding in the case in your different orientations so we can try to learn this mapping from low resolution images to high resolution images these are all done in 3d you know using this simple formulation so we have a lottery solution grid the blue dots and from that we want to predict you know the red dots which are you know double four times the size and we do this you know using once again you know regression trees regression for us and you know these are very difficult to see but you know here the results of our you know forest interpolator if you want to think this way this is the original high-resolution data and numerically and visually a little bit you can see there here we retain some you know high frequency information our pages you know compared to other techniques and also pneumatically we show that you know we do better and this is yet another very weird and not using in clinical work a technique called nadi it's a different type of magnetic resonance you know modality so this is the numbers all good you know i want to conclude here so that we can you know have some time for questions if you have any by saying that this work is not only super cool but also extremely useful and things are changing the landscape is changing as we speak there is all sorts of you know crazy stuff happening out there with you know social networks and machine machine you know learning and artificial intelligence but i bet the one if not only one of the next big things will be a revolution in the medical sector there is huge amount of data which is completely untapped machine learning and AI techniques applied to that data really promises to revolutionize the way in which your healthcare is delivered the way you know and reduce the costs as well as you know benefiting the patients as well so if you're looking for a job you know this is a really good area to you know learn more about and you know put your skills in and and and i'm sure you'll do well so any questions yes I have a question the example of the presentation is sue Perkins great the the examples you gave are all related to a snapshot in time where you have a certain image and you look at the image and then you come up with something how do you deal with temporal information there is more of a temporal nature so you're following a certain case and you're following the developments whether they're becoming better or worse based on the the type of treatment that they get yeah it's an excellent question so you know looking at temporary studies is really where we want to go we haven't done that yet mostly because of your lack of data really you know getting the ground truth data is extremely expensive you know even for these single temporal snapshots getting them for you know following through the patient is even more difficult but also there are lots of challenges for instance in the case of the brain tumor you know those are all preoperative images so these are the images that get acquired you know when they come up with a diagnosis before any treatment is done after the treatment is done if it is radiotherapy or if it is surgery you know the image will look very very different in particular in the case of surgery if the tumor has been resected then you will be left with sometimes a really big hole in the brain and also around the hole there will be some scar tissue which is your natural visa is a healthy process of healing but unfortunately if you image the patient again that scar tissue from magnetic resonance is impossible to distinguish it from tumor so it just looks exactly like a tumor and so you don't know whether the tumor is growing back already over there that is scar tissue and that is one of the biggest problems right now with you know this type of tour probably probably it will be a lot easier than what I just described but we haven't done so and definitely a very interesting area to look at thanks we talked and have you include expert data in your featuring so using as prior information um so the experts you know the medical expert just – list yes the specialist work with us all the time right so this work cannot happen without having medical expertise and it is very important if you want to embark in this sort of work to really connect with local hospitals who have got the right level of expertise and we particularly lucky here because again Addenbrooke's is one of the best but I'm also working with you know hospital in hospitals in France and and in the US as well the experts give us your domain knowledge which is clearly super important but also a help has you know get a ground truth data the ground truth labels we don't necessarily want them to give us features like visual features because nobody knows how to design good features we want the machine learning algorithm to extract the features they are good for the task and this is you know part of this is part of the revolution that is happening right now in machine learning and computer vision where you have all these you know deep convolutional neural networks that the one thing they're really really really good at is you know really figure out automatically what features work you know for the task some of this you know automatic extraction at least selection of features can be done also with indecision for is in a slightly different way but we try to not to design the features too much because we're always wrong it's a bit like you know writing your own if-then statements they're always wrong okay hi thank you could you please elaborate on why decision trees are working that good in this problem in the problem of the the segmentation or even the prediction of sizes of the the brain different brain signal sure there are many answers to this question so one of them is the decision forests are particularly efficient you know both the train time and our test time right so a train time you know being efficient it means that you can try many different experiments very quickly and if you have some high level it what's called hyper parameters to to tune that you can do that very quickly until you know the problem your works a test time it is important to have efficiency because you know the doctors always in a hurry and the patient even more so so that's good they from an accuracy point of view it has been observed repeatedly the having multiple trees you know produces very good accuracy you know even on previously unseen images which is really important to stress so that you know although individual decision trees have this problem of overfitting learning too much about the training data into little about you know generalizing decision forests are better in that sense have inside of this you know decision forests are not necessarily the only thing the words in this case right so having seen what is happening with convolutional neural networks you know I have little doubt you know they will work very well in this you know cases well in fact we did do some experimentation with convolutional neural networks apply to 3d multi modality magnetic resonance images and indeed they do work very well however some of you may have had you know I have tried you know using convolution neural networks already and training them is very complicated very time-consuming figuring out what is the right architecture of the network is something there is quite painful to do so there are advantages and disadvantages both of them would work in this problem but there are still differences I think there was a question in the middle there if before they are sorry I was wondering do you see basically the loop closed in the sense that the results from from machine learning being used for the medical doctors to train to train to see kind of tumors in the images yes so one way in which I could see the sort of tools be used is imagine that the doctors looks at the MRI of a patient using their standard visualization tools which normally come within the you know with their pack station when they buy the scanner they also buy some visualization tool but then you can augment the visualization tool to say oh look if you click this button you can see my estimate you know where the tumor is and it's segmentation map and you know they can look at them and then they can say oh yeah I can see why that looks good or if it is incorrect as it might be then they say oh this is incorrect this is wrong and effectively by clicking on a button which says this is wrong the data is sent back through some cloud base repository where it is labeled as oh you know the algorithm you know messed it up and so the expert algorithmic people like myself could look at the data and say okay it is not working here why is it and you try to reason you try to improve the algorithm and fit it back and you know with time the algorithm will become better and better so this is this feedback loop could be extremely useful but kind of also for the medical doctors so you may detect something that the doctor doesn't see at that moment episode that's why especially in oncology they have you know weekly meetings where they discuss the same case multiple doctors at once which is clearly a very expensive thing to do because time is money but multiple oncologists they sit around the same table they look at the same data and they interpret you know together because of that unfortunately a lot of times I would like to ask you to take the remaining questions offline I would like to thank you all for contributing to the top with your interesting questions and to antwon your foreign very interesting talk enjoy everyone each year Microsoft Research helps hundreds of influential speakers from around the world including leading scientists renowned experts in technology book authors and leading academics and makes videos of these lectures freely available you you

7 comments

  1. Awesome anser on : What are some of the best resources for learning machine learning, specifically as it relates to the healthcare industry
    read here :http://qr.ae/TUNOUr

  2. but is it taking into account the holistic side of medicine??? and how plants and natural remedies can also have effects on systems??? there is a whole branch of medicine here being neglected by big pharma, the doctors and these programmers.

  3. "If-then statements are always wrong", for a machine, because the machine has no prior learning. But the same principle applies to human learning of course and what is learned by teaching is a form of parallel processing, like any brain composed of many senses and experiences of "hard and soft-ware".

    "Healthy minds make healthy bodies" is a fundamental principle, and this field is the optimum focus for attention.

Leave a Reply

(*) Required, Your email will not be published