**Cross entropy** is a measure of the degree of inequality between two probability distributions. In the context of supervised learning, one of these distributions represents the “true” label for a training example, where the correct responses are assigned a value of 100%.

#### Machine learning

If p(x) represents the probability distribution of “true” labels from a training example and q(x) represents the “guess” of the machine learning algorithm, the cross-entropy is calculated as follows:

H(p,q)=∑p(x)log(q(x))

Here, x represents the outcomes which the machine learning algorithm attempts to predict. In radiology, this may be the presence or absence of a pathology (e.g. “fracture” vs “no fracture”) or may be a list of possible pathologies (e.g. “malignancy”, “pneumonia”, etc.)

As the prediction more closely approximates the “correct” answer, the cross-entropy reaches a minimum. Supervised machine learning algorithms seek to adjust network parameters such that the cross-entropy is minimized across training examples – in other words, when the predictions q(x) most closely approximate p(x).

#### Applications in radiology

If an algorithm seeks to classify chest x-rays as being either “pneumonia” or “no pneumonia”. The algorithm is given a chest x-ray which is known to be in a patient with pneumonia. Assume that the algorithm predicts a 51% chance of the chest x-ray representing pneumonia. In this case:

p(Pneumonia) = 1.00

p(no Pneumonia) = 0.00

q(Pneumonia) = 0.51

q(no Pneumonia) = 0.49

This yields a cross-entropy of 0.67. Conversely, a more accurate algorithm which predicts a probability of pneumonia of 98% gives a lower cross entropy of 0.02.