The aggregation of an imaging data set is a critical step in building artificial intelligence (AI) for radiology. Imaging data sets are used in various ways including training and/or testing algorithms. Many data sets for building convolutional neural networks for image identification involve at least thousands of images but smaller data sets are useful for texture analysis, transfer learning, and other programs.
External links
Many commercial AI products are built on proprietary data sets or specific hospital data sets not available due to concerns over patient privacy. There are however several imaging data sets of radiological images and/or reports publicly available at the following websites:
1000 Functional Connectomes Project: over 1000 functional MRI exams collected from sites across the globe
ACR Data Science: list of ~20 data sets
CANDID-PTX dataset: 19,237 chest X-ray Dicom images with segmentation labels for pneumothorax, rib fractures, and chest tubes and corresponding free text reports from New Zealand.
CheXpert: 224,316 chest radiographs
Computed Tomography Emphysema Database small images specifically for texture analysis
COVID-19 Open Annotated Radiology Database (RICORD) expert annotated COVID-19 imaging dataset. 1000 chest x-rays and 240 thoracic CT exams
Johns Hopkins University Data Archive contains a data set of head CT scans
MD.ai: a collection of public projects
NIH CXR8: 112,120 frontal chest radiographs
OpenI - The Open Access Biomedical Image Search Engine: data sets search engine with application programmer interface (API) to create customized data sets available at MedPix
OpenNeuro: list of over 200 neuro data sets
OASIS: open access neuro data sets
Spineweb 16 spinal imaging data sets
MRNet: 1,370 annotated knee MRI examinations
MURA: a large dataset of musculoskeletal radiographs
MIMIC-CXR Database: 377,110 chest radiographs with free-text radiology reports
PADCHEST: 160,000 chest X-rays with multiple labels on images
RSNA Pulmonary Embolism CT (RSPECT) dataset 12,000 CT studies
RSNA 2019 Brain CT Hemorrhage dataset: 25,312 CT studies
UC Irvine Machine Learning Repository: various radiological and nuclear medicine data sets among other types of data sets
York Cardiac MRI Dataset : cardiac MRIs
The Visible Human Project Dataset: CT, MRI and cryosectional images of complete cadavers
Zenodo searchable projects
Additionally, The Cancer Imaging Archive contains links to many open radiology data sets including the following:
Osteosarcoma data from UT Southwestern/UT Dallas for Viable and Necrotic Tumor Assessment
Synthetic and Phantom MR Images for Determining Deformable Image Registration Accuracy (MRI-DIR)
The Cancer Genome Atlas Breast Invasive Carcinoma Collection (TCGA-BRCA)
The Cancer Genome Atlas Cervical Kidney Renal Papillary Cell Carcinoma Collection (TCGA-KIRP)
The Cancer Genome Atlas Colon Adenocarcinoma Collection (TCGA-COAD)
The Cancer Genome Atlas Esophageal Carcinoma Collection (TCGA-ESCA)
The Cancer Genome Atlas Glioblastoma Multiforme Collection (TCGA-GBM)
The Cancer Genome Atlas Head-Neck Squamous Cell Carcinoma Collection (TCGA-HNSC)
The Cancer Genome Atlas Kidney Chromophobe Collection (TCGA-KICH)
The Cancer Genome Atlas Kidney Renal Clear Cell Carcinoma Collection (TCGA-KIRC)
The Cancer Genome Atlas Liver Hepatocellular Carcinoma Collection (TCGA-LIHC)
The Cancer Genome Atlas Low Grade Glioma Collection (TCGA-LGG)
The Cancer Genome Atlas Lung Adenocarcinoma Collection (TCGA-LUAD)
The Cancer Genome Atlas Lung Squamous Cell Carcinoma Collection (TCGA-LUSC)
The Cancer Genome Atlas Prostate Adenocarcinoma Collection (TCGA-PRAD)
The Cancer Genome Atlas Rectum Adenocarcinoma Collection (TCGA-READ)
The Cancer Genome Atlas Stomach Adenocarcinoma Collection (TCGA-STAD)
The Cancer Genome Atlas Thyroid Cancer Collection (TCGA-THCA)
The Cancer Genome Atlas Urothelial Bladder Carcinoma Collection (TCGA-BLCA)
The Cancer Genome Atlas Uterine Corpus Endometrial Carcinoma Collection (TCGA-UCEC)
If any of these links are broken or for other problems and questions, please contact [email protected].