Radiopaedia Blog : Machine learning

This month I was fortunate enough to co-author a really interesting paper in Radiology entitled Chest Radiographs in Congestive Heart Failure: Visualizing Neural Network Learning 1. We described a novel use for GANs (more about these shortly) in helping to visualize disease predictions made by AI - and the results were quite literally revealing. 

Like it or not, artificial intelligence has become a big deal in radiology of late, and while it is almost certainly over-hyped, it is likely that we’ll soon see some integration into clinical practice. In this post, I want to briefly describe our research, show some animated GIFs (always fun) and speculate on the future.

First, a little background on GANs…

What do the three above images have in common? You probably can't tell instantly, but the answer is that none of them are real. Each image was artificially created by a GAN, a Generative Adversarial Network 2,3. The x-ray, the bedroom, and the celebrity dude are all totally fake - although you could argue that every celebrity is fake, but that’s another issue.

GANs are a fascinating form of deep learning where two neural networks compete against each other (adversarially) to learn how to create fake data. The generator network is tasked with creating fake data (in our case fake chest x-rays) and the discriminator network is tasked with detecting fake data from amongst real data (detecting fake chest x-rays).

Initially, the generator is terrible at producing fake x-rays and the discriminator spots them all. But the generator learns from these rejections and over many cycles it gets better and better at making x-rays that appear realistic. Likewise, the discriminator gets better and better at spotting even subtle forgeries. Eventually, the generator learns how to create fake data that is indistinguishable from real data (within the limits of its architecture).

Unlike fake news, fake data is a good thing and can be really, really useful... tremendously useful. I know that seems counterintuitive at first (and at second and at third) but it is true. There are already hundreds of applications for GANs that have been described in the scientific literature, across many disparate fields. So far their use in radiology, however, has been relatively small.

Now on to our real fake research... and GIFs!

Our idea was to use the example of heart failure prediction to see if a chest x-ray GAN could help reveal the image features learned by a neural network. We basically asked, “okay AI, if you’re so confident that this chest has heart failure, show me what you would change on the x-ray to remove the disease?”. The expectation would be that a well-trained model would highlight traditional features of cardiac failure like cardiomegaly (arrowheads), pleural effusions (arrow) and airspace opacity (star) - which is exactly what it did.

The full technical details are in the paper and supplement 4, but the quick summary is that we used ~100,000 chest x-rays to create a generator capable of producing low-resolution fakes (128 x 128 pixels) from a latent space. We then encoded ~7,000 real chest x-rays into the latent space, trained a smaller neural network to predict heart failure (BNP levels) on these representations, statistically manipulated them to remove the heart failure prediction, and then decoded the result into a fake “healthy” version of the original x-ray.

By superimposing the predicted change over the original x-ray, we create what we call a Generative Visual Rationale (GVR). The orange represents density that the model would remove and purple density that the model would add in order to remove the prediction of heart failure. Here’s an animated GIF (as promised) showing the model dynamically lowering its heart failure prediction and the associated GVR.  


Seeing beyond the expected

However, heart failure was not all that the GVRs revealed. You’ll note above that the chest wall highlights purple and breast tissue orange. That's odd, right? But not when you consider that we used B-type natriuretic peptide blood levels (BNP) as our label for heart failure and that BNP has a known independent negative association with obesity and positive association with female gender 5,6. So the model was, in fact, using image features not associated with heart failure to improve its BNP predictions, and the GVRs conveyed this.

Side markers were another predictive factor that the GVRs exposed. The model would often add a conventional (non-digital) side marker when attempting to remove a heart failure prediction, probably because at our institution conventional side markers are primarily used in non-urgent settings where patients are more likely to be well with a low pre-test probability for heart failure. So the AI was using the external marker to help game its predictions. Look back at this first GIF to see this happen on the patient's right. 

We also took normal chest x-rays and asked the model to give them heart failure (inverse GVRs). These confirmed again that cardiomegaly, pleural effusions and airspace opacity had been learned as signs of heart failure, but also that pacemakers had been learned - materializing as if from nowhere in another GIF!


You might ask - were we simply imposing our own preconceived notions on the GVRs? To test this, we compared GVRs from our well-trained model to a deliberately overfitted model that had seen the test data during training (a big deep learning no-no). Our hypothesis was that the overfitted model would perform extremely well on the test data (because of memorization) but that it would not produce very meaningful GVRs. Sure enough, blinded GVR assessment by a radiologist and radiology registrar confirmed this, with only 36% highlighting potential heart failure features compared to 80% from the well-trained model.

So, what does this mean for the future?

Well, arguably for the first time we now have a method for visualizing AI predictions in medical imaging that goes beyond identifying which image patches contribute to the final prediction. We have a technique that can reveal global image features in combination. From a safety perspective, this is a welcome advance, as it allows radiologists to confirm that individual predictions are reasonable, and to better detect AI faults, cheating, and biases.

The major current limitation to our method is GAN resolution, although it seems likely that this will be overcome 3. The architecture needed for GVRs is also different to commonly used neural networks and so this may further limit use, especially if the predictive power of GVR-friendly techniques is inferior.

Extrapolating further, it is conceivable that GVRs could soon be used to uncover imaging signs of disease previously unknown to humans. It's also conceivable that instead of visually predicting disease, the technique could be used to visually predict the future. “Hey AI, show me what you think this lesion/mass/bleed will look like tomorrow? Or next year?”. The amount of follow-up imaging performed on our patients is so large, and time is such an accessible and definite label, that training a radiology "pre-cognition" system is possibly not that far fetched.


About The Authors: Dr. Andrew Dixon (last author, blog author) is a radiologist and Co-Director of Radiology Training at the Alfred Hospital in Melbourne. He is Academic Director for Radiopaedia. Dr. Jarrel Seah (first author) is a radiology registrar at the Alfred Hospital in Melbourne. Dr. Jennifer Tang (second author) is a radiology registrar at the Royal Melbourne Hospital. Andy Kitchen (third author) is a machine learning researcher and organizer of the Melbourne Machine Learning & AI Meetup. Associate Professor Frank Gaillard (fourth author) is a neuroradiologist and Director of Research in the University of Melbourne Department of Radiology and Royal Melbourne Hospital. He is Founder and Editor in Chief of Radiopaedia. 

1. Seah JCY, Tang JSN, Kitchen A, Gaillard F, Dixon AF. Chest Radiographs in Congestive Heart Failure: Visualizing Neural Network Learning. (2018) Radiology. doi:10.1148/radiol.2018180887 - Pubmed

2. Goodfellow, Ian J., Pouget-Abadie, Jean, Mirza, Mehdi, Xu, Bing, Warde-Farley, David, Ozair, Sherjil, Courville, Aaron, Bengio, Yoshua. Generative Adversarial Networks. (2014)

3. Karras, Tero, Aila, Timo, Laine, Samuli, Lehtinen, Jaakko. Progressive Growing of GANs for Improved Quality, Stability, and Variation. (2017)

4. Seah, Jarrel, Tang, Jennifer, Kitchen, Andy, Seah, Jonathan. Generative Visual Rationales. (2018)

5. Clerico A, Giannoni A, Vittorini S, Emdin M. The paradox of low BNP levels in obesity. (2012) Heart failure reviews. 17 (1): 81-96. doi:10.1007/s10741-011-9249-z - Pubmed

6. Hsich EM, Grau-Sepulveda MV, Hernandez AF, Eapen ZJ, Xian Y, Schwamm LH, Bhatt DL, Fonarow GC. Relationship between sex, ejection fraction, and B-type natriuretic peptide levels in patients hospitalized with heart failure and associations with inhospital outcomes: findings from the Get With The Guideline-Heart Failure Registry. (2013) American heart journal. 166 (6): 1063-1071.e3. doi:10.1016/j.ahj.2013.08.029 - Pubmed

The open source movement is revolutionising medicine. Never before in human history has there been such knowledge and opportunity available to anyone with perseverance and a connected device. In fact with enough patience, there are multiple, perhaps seemingly infinite tools and skills one can acquire, that enable quite sophisticated analysis of medical images (among many other areas of science and medicine). I’d like to explain how the ‘stars have aligned’ for this revolution and glimpse future possibilities, whilst also acknowledging a degree of hype surrounding AI and its application to medicine. 

Punch card data entry. Wikimedia Commons here

To put the present in some sort of context, my father-in-law took a computer subject at university in the seventies. In large groups, one of their assignments was to punch holes into a long piece of paper which they fed into a computer to produce a very basic game of ping pong. This computer was state-of-the-art at the time and took up several stories of the university. Mobile phone users are expected to tick over 5 billion next year, each of these capable of providing vast amounts of knowledge and at least theoretical training for many different skills to anyone who can afford one (not everyone). Who knows what computational power and device size will be common-place in another 3 or 4 decades. Futurist Ray Kurzweil has, for example, predicted that by 2049, one computer will have more computational power than the entire human race combined.

Radiopaedia was preceded by more general open source platforms. All manner of these now inhabit many corners of the web, including growing and increasingly comprehensive biobanks rich with patient-level data. The gradual specialization of open source sites is not unique to science and medicine. For example, there are now numerous open source communities that foster the learning and progress of programming languages like python. Whilst vanilla linear and logistic regression have been around since the 1950s, now with a few lines of code we have devices that can crunch these algorithms en masse. Enter machine learning. For free at, you can spend a few hours (ok probably days or weeks) and process millions of data points to draw insights and make reasonable predictions about new data. If you are pressed for time or perhaps less technically inclined, there are palatable discussions of cutting-edge technologies: features regular podcasts from a data scientist explaining concepts to his non-data scientist partner. This one is a great start for anyone curious about applying machine learning to medical imaging. The scope and complexity of mathematical models for predictive and other analytics continues to expand and with open source code, you don’t have to be Good Will Hunting to enact them. Using radiomics techniques to predict mortality from chest CTs has been conceptually proven (Oakden-Rayner et al 2017). Machine learning (including deep learning) ought to expand the detection of pre-clinical disease states prior to the patient developing symptoms and could be a stimulus for much wider uptake of medical imaging. Such a proliferation of image acquisition poses another set of questions to the radiology field. Pre-clinical detection is applicable to some diseases more than others but perhaps even apparently unforeseeable conditions like major trauma will one day be accurately predicted by a network of biobanks, machine learning algorithms and an internet of things (IoT). Regardless, technological advances spur on precision medicine which will eventually be genome and probably environment specific. Social and ethical debates about how this may widen the gap between the ‘haves’ and ‘have-nots’ are inevitable and desirable. There are many examples of how technological advances disperse for global benefit; mobile phones and Radiopaedia itself are great examples of these.

At least for the next 30 years, there will always be radiologists in some form or another. We are a long way from any form of AI being able to listen to, digest and give salient advice about complex medical histories and examinations; point out pivotal features in selecting different modalities to colleagues; be perspicacious in high-stakes multi-disciplinary meetings or perform complicated procedures. There are also less easily defined roles for the human touch, the laying-on-of-hands or the thoughtful, attentive and knowing nod that patients appreciate and as any clinician will identify, can be seemingly therapeutic in and of themselves. For the foreseeable future, deep learning algorithms rely on more than just a handful of examples for a given condition. The deep learning results prompted by the UK’s NIH open source chest x-ray database, whilst pivotal and theoretically exciting, have been confined to certain entities (eg pneumonia, cardiomegaly, pneumothorax etc) and not currently feasible for workstation, coal-face translation. It will be a while yet before workstation software can effectively point out uncommon findings like Luftsichel sign. So for at least a few decades to come, Radiopaedia will be a valid tool for us humans recognizing rare and uncommon conditions and trainees will still be pouring over thousands of chest x-rays each.

This combination of open source capabilities is the very exciting infancy of radiomics - beyond what is (my new favorite term...) ‘human-readable’. We can now process medical image data on a scale that would make Wilhelm Roentgen physically (and metaphysically) ill. It is an incredibly exciting time to be a part of what some are calling ‘the fourth industrial revolution’. Only time will tell if these kind of statements are hype but for sure, we have only just now witnessed the tip of the open source, medical data iceberg and Radiopaedia is strapped in for the ride.

About the author: James Condon graduated from medicine 2014 and is commencing as a PhD candidate 2018 in the use of computer vision for medical image interpretation. He works casually in emergency medicine and clinical trials and has previously completed a range of medical and surgical rotations in Adelaide.

Disclosure: J. Condon is commencing independent post-graduate research with G. Carneiro and L. Palmer, co-authors of a journal article referenced in this piece. They were not involved in the writing of this blog. 

Disclaimer: Views expressed in blog posts are those of the author and not necessarily those of or his/her employer. 

Blog Subscription

We will only send you an email when there are new posts.

Updating… Please wait.

 Unable to process the form. Check for errors and try again.

 Thank you for updating your details.