Information leakage
Last revised by Yusra Sheikh on 2 Aug 2021
Citation, DOI, disclosures and article data
Citation:
Botz B, Sheikh Y, Information leakage. Reference article, Radiopaedia.org (Accessed on 23 Apr 2024) https://doi.org/10.53347/rID-83268
rID:
83268
Article created:
18 Oct 2020,
Bálint Botz ◉
Disclosures:
At the time the article was created Bálint Botz had no recorded disclosures.
View Bálint Botz's current disclosures
Last revised:
2 Aug 2021,
Yusra Sheikh ◉
Disclosures:
At the time the article was last revised Yusra Sheikh had no recorded disclosures.
View Yusra Sheikh's current disclosures
Revisions:
2 times, by
2 contributors -
see full revision history and disclosures
Sections:
Information leakage is one of the common and important errors in data handling during all machine learning applications, including those in radiology. Briefly, it means the incomplete separation of the training, validation, and testing datasets, which can significantly change the apparent performance of the algorithmic method.
Since data overlap between datasets is a critical biasing factor, it is crucial to split data at the beginning of the study, before proceeding with any further steps (e.g. feature extraction) as these can result in data leakage 1.
References
- 1. Burak Kocak, Ece Ates Kus, Ozgur Kilickesmez. How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts. (2020) European Radiology. doi:10.1007/s00330-020-07324-4 - Pubmed
Related articles: Artificial intelligence
- artificial intelligence (AI)
- imaging data sets
- computer-aided diagnosis (CAD)
- natural language processing
- machine learning (overview)
- visualizing and understanding neural networks
- common data preparation/preprocessing steps
- DICOM to bitmap conversion
- dimensionality reduction
- scaling
- centering
- normalization
- principal component analysis
- training, testing and validation datasets
- augmentation
- loss function
-
optimization algorithms
- ADAM
- momentum (Nesterov)
- stochastic gradient descent
- mini-batch gradient descent
-
regularisation
- linear and quadratic
- batch normalization
- ensembling
- rule-based expert systems
- glossary
- activation function
- anomaly detection
- automation bias
- backpropagation
- batch size
- computer vision
- concept drift
- cost function
- confusion matrix
- convolution
- cross validation
- curse of dimensionality
- dice similarity coefficient
- dimensionality reduction
- epoch
- explainable artificial intelligence/XAI
- feature extraction
- federated learning
- gradient descent
- ground truth
- hyperparameters
- image dataset normalization
- image registration
- imputation
- iteration
- jaccard index
- linear algebra
- noise reduction
- normalization
- R (Programming language)
- radiomics quality score (RQS)
- Python (Programming language)
- segmentation
- semi-supervised learning
- synthetic and augmented data
- overfitting
- underfitting
- transfer learning