Data augmentation

Last revised by Dimitrios Toumpanakis on 15 Apr 2021

Citation, DOI, disclosures and article data

Citation:

Wang D, Toumpanakis D, Yu Jin T, et al. Data augmentation. Reference article, Radiopaedia.org (Accessed on 19 Apr 2024) https://doi.org/10.53347/rID-61722

DOI:

https://doi.org/10.53347/rID-61722

Permalink:

https://radiopaedia.org/articles/61722

rID:

61722

Article created:

16 Jul 2018, David John Wang

Disclosures:

At the time the article was created David John Wang had no recorded disclosures.

View David John Wang's current disclosures

Last revised:

15 Apr 2021, Dimitrios Toumpanakis

Disclosures:

At the time the article was last revised Dimitrios Toumpanakis had no recorded disclosures.

View Dimitrios Toumpanakis's current disclosures

Revisions:

8 times, by 6 contributors - see full revision history and disclosures

Sections:

Artificial Intelligence

Tags:

ai, machine learning

Synonyms:

augmentation

Data augmentation is a technique that increases the amount of data by adding slightly modified copies of already existing data. This increases the diversity of the training set, which helps to reduce overfitting when training a machine learning model and can have a positive effect on the model's predictive performance.

Training with a higher volume of data usually yields better predictive and more accurate models, as the model is able to see a greater variety of examples and generalize more effectively. Data augmentation is used to counter the problem of data scarcity that is often encountered in machine learning.

Most applications of machine learning in radiology involve images as data. Regarding images, data augmentation can be achieved in many ways. For example:

flipping/mirroring the image
rotating the image
adding noise to the image ("noise injection")
color modification
random erasing

Caution is advised when using data augmentation on radiological images since some transformations can result in non-realistic images (e.g. a horizontal flip on chest X rays introducing a systematic error of dextrocardia).

In the case of significant data scarcity, the above simple techniques may be only of limited help. If a dataset is too small, then a transformed image set via rotation and mirroring etc. may still be too small for a given problem. In that case, a complimentary solution can be the sourcing of entirely new and synthetic images through various techniques, commonly the use of generative adversarial networks.