Data augmentation

Last revised by Dimitrios Toumpanakis on 15 Apr 2021

Data augmentation is a technique that increases the amount of data by adding slightly modified copies of already existing data. This increases the diversity of the training set, which helps to reduce overfitting when training a machine learning model and can have a positive effect on the model's predictive performance.

Training with a higher volume of data usually yields better predictive and more accurate models, as the model is able to see a greater variety of examples and generalize more effectively. Data augmentation is used to counter the problem of data scarcity that is often encountered in machine learning.

Most applications of machine learning in radiology involve images as data. Regarding images, data augmentation can be achieved in many ways. For example:

  • flipping/mirroring the image
  • rotating the image
  • adding noise to the image ("noise injection")
  • color modification
  • random erasing

Caution is advised when using data augmentation on radiological images since some transformations can result in non-realistic images (e.g. a horizontal flip on chest X rays introducing a systematic error of dextrocardia). 

In the case of significant data scarcity, the above simple techniques may be only of limited help. If a dataset is too small, then a transformed image set via rotation and mirroring etc. may still be too small for a given problem. In that case, a complimentary solution can be the sourcing of entirely new and synthetic images through various techniques, commonly the use of generative adversarial networks.

ADVERTISEMENT: Supporters see fewer/no ads

Updating… Please wait.

 Unable to process the form. Check for errors and try again.

 Thank you for updating your details.