Information leakage

Last revised by Yusra Sheikh on 2 Aug 2021

Information leakage is one of the common and important errors in data handling during all machine learning applications, including those in radiology. Briefly, it means the incomplete separation of the training, validation, and testing datasets, which can significantly change the apparent performance of the algorithmic method. 

Since data overlap between datasets is a critical biasing factor, it is crucial to split data at the beginning of the study, before proceeding with any further steps (e.g. feature extraction) as these can result in data leakage 1.
 

ADVERTISEMENT: Supporters see fewer/no ads