★★★★★ Two Minute Papers: DeepMind’s New AI Helps Detecting Breast Cancer •
In this AI video ...
This transcript was generated by an AI at Otter.ai
Dear Fellow Scholars,
this is Two Minute Papers with Dr. Károly Zsolnai-Fehér here. These days, we see so many amazing uses for learning based algorithms from enhancing computer animations, teaching virtual animals to walk, to teaching self driving cars depth perception, and more. It truly feels like no field of science is left untouched by these new techniques, including the medical sciences.
You see in medical imaging the common problem is that we have so many diagnostic images out there in the wild that it makes it more and more infeasible for doctors to look at all of them. What you see here is a work from scientists at DeepMind Health that we covered a few 100 episodes ago, the training part takes about 14,000 optical coherence tomography scans. This is the OCT label that you see on the left. These images are cross sections of the human retina, we first start out with this OCT scan, then, a manual segmentation step follows where a doctor marks up this image to show where the relevant parts lie the retinal fluids, or the elevations of the retinal pigments are after the learning process.
This method can reproduce the segmentations really well by itself without the doctor’s supervision. And you see here that the two images are almost identical in these tests. Now that we have the segmentation map, it is time to perform classification. This means that we’ll look at this map and assign the probability to each possible condition that may be present. Finally, based on these, a final verdict is made whether the patient needs to be urgently seen, or just a routine check, or perhaps no check is required. This was an absolutely incredible piece of work. However, it is of utmost importance to evaluate these tools together with experienced doctors, and hopefully on international datasets.
Since then, in this new work, DeepMind has knocked the evaluation out of the park for a system they developed to detect breast cancer as early as possible. Let’s briefly talk about the technique. And then I’ll try to explain why it is sinfully difficult to evaluate it properly. So on to the new problem. These mammograms contain four images that show the breasts from two different angles. And the goal is to predict whether the biopsy taken later will be positive for cancer or not. This is especially important because early detection is key for treating these patients.
And the key question is, how does it compare to the experts? Have a look here. This is a case of cancer that was missed by all six experts in the study, but was correctly identified by the AI. And what about this one? This case didn’t work so well. It was caught by all six experts, but was missed by the AI. So one reassuring sample and one failed sample.
And with this, we have arrived at the central thesis of the paper, which asks the question, what does it really take to say that an AI system surpassed human experts to even have a fighting chance in tackling this, we have to measure false positives and false negatives. The false positive means that the AI mistakenly predicts that the sample is positive, when in reality it is negative. The false negative means that the AI thinks that the sample is negative, whereas it is positive in reality.
The key is that in every decision domain, the permissible rates for false negatives and positives is different. Let me try to explain this through this example. In cancer detection. If we have a sick patient who gets classified as healthy is a grave mistake that can lead to serious consequences. But if we have a healthy patient who is misclassified as sick, the positive cases get a second look from a doctor who can easily identify the mistake. The consequences in this case, are much less problematic, and can be remedied by spending a little time checking the samples that the AI was less confident about.
The bottom line is that there are many different ways to interpret the data. And it is by no means trivial to find out which one is the right way to do so. And now hold on to your papers because here comes the best part. If we compare the predictions of the AI to the human experts, we see that the false positive cases in the US have been reduced by 5.7%. While the false negative cases have been reduced by 9.7%. That is the holy grail. We don’t need to consider the cost of false positives or negatives here because it reduced false positives and false negatives at the same time. Spectacular.
Another important detail is that these numbers came out of an independent evaluation. It means that the results did not come from the scientists who wrote the algorithm, and have been thoroughly checked by independent experts who have no vested interest in this project. This is the reason why you see so many authors on this paper. Excellent.
Another interesting tidbit is that the AI was trained on subjects from the UK. And the question was, how well does this knowledge generalize for subjects from other places? For instance, the United States? Is this UK knowledge reusable in the US? I have been quite surprised by the answer, because it never saw a sample from anyone in the US and still did better than the experts on US data. This is a very reassuring property. And I hope to see some more studies that show how general the knowledge is that these systems are able to obtain through training, and perhaps the most important, if you remember one thing from this video, let it be the following.
This work much like other AI infused medical solutions are not made to replace human doctors, the goal is instead to empower them, and take off as much weight from their shoulders as possible. We have hard numbers for this, as the results concluded that this work reduces this workload of the doctors by 88%, which is an incredible result, among other far reaching consequences, I would like to mention that this would substantially help not only the work of doctors in wealthier, more developed countries, but it may single handedly enable proper cancer detections in more developing countries who cannot afford to check these scans.
And note that in this video, we truly have just scratched the surface. Whatever we talk about here in a few minutes cannot be a description as rigorous and accurate as the paper itself, so make sure to check it out in the video description. And with that, I hope you now have a good feel of the pace of progress in machine learning research.
The retina fluid project was state of the art in 2018. And now, less than two years later, we have the proper independently evaluated AI based detection for breast cancer. Bravo DeepMind. What a time to be alive.
Thanks for watching and for your generous support and I’ll see you next time.