Clustering

Last revised by Candace Makeda Moore on 9 May 2024

Clustering, also known as cluster analysis, is a machine learning technique designed to group similar data points together. Since the data points do not necessarily have to be labeled, clustering is an example of unsupervised learning. Clustering in machine learning should not be confused with discovering clusters in epidemiology.

There are many algorithms that have been developed to achieve clustering, and the effectiveness of each is largely dependent on the size of the dataset and the distribution of data points. Some of the more commonly used groups of algorithms for clustering in radiology, which have been in use for decades for the task of segmentation, include Fuzzy C mean clustering and K means clustering 1,2. One of most popular types of algorithms for clustering is K-means, which seeks to group a dataset into K number of clusters. An example of a more advanced algorithm is Density-Based Spatial Clustering of Applications with Noise (DBSCAN), which is more effective for data distributed in a non-guassian manner.

In radiology (as well as pathology), clustering groups data, which may correspond to sets of pixels or voxels within images, whole images, reports or patients, by similarities in terms of various attributes or features without being explicitly programmed about final labels to group by. Thus clustering has the potential to reveal similarities in data overlooked by humans.

Practically speaking,  clustering has proven useful in segmentation algorithms for radiology, which are used to identify different tissue types and/or differentiate pathological and normal tissue. However clustering algorithms are researched in other areas such as natural language processing of reports 3

ADVERTISEMENT: Supporters see fewer/no ads

Updating… Please wait.

 Unable to process the form. Check for errors and try again.

 Thank you for updating your details.