Kappa

Last revised by Candace Makeda Moore on 10 Sep 2020

Citation, DOI, disclosures and article data

Citation:

Morgan M, Moore C, Kappa. Reference article, Radiopaedia.org (Accessed on 08 Jul 2024) https://doi.org/10.53347/rID-35545

DOI:

https://doi.org/10.53347/rID-35545

Permalink:

https://radiopaedia.org/articles/35545

rID:

35545

Article created:

10 Apr 2015, Matt A. Morgan ◉

Disclosures:

At the time the article was created Matt A. Morgan had no recorded disclosures.

View Matt A. Morgan's current disclosures

Last revised:

10 Sep 2020, Candace Makeda Moore

Disclosures:

At the time the article was last revised Candace Makeda Moore had no recorded disclosures.

View Candace Makeda Moore's current disclosures

Revisions:

3 times, by 2 contributors - see full revision history and disclosures

Tags:

statistics, research

Synonyms:

Cohen's kappa
Cohen's kappa coefficient
Fleiss' kappa

Kappa is a nonparametric test that can be used to measure interobserver agreement on imaging studies. Cohen's kappa compares two observers, or in the case of machine learning can be used to compare a specific algorithm's output versus labels. Fleiss' kappa assesses interobserver agreement between more than two observers.

If comparing two observers, the concept behind the test is similar to the chi-squared test. Two 2 x 2 tables are set up: one with the expected values if there were chance agreement, and one with your actual data. Kappa will indicate how much of your interobserver agreement was due to chance.

To find the expected values, find the product of the marginals:

To find the expected value for the +/+ cell: [(O₁ + O₂₎ x (O₁ +O₃)] / total observations

To find the expected value for the -/- cell: [(O₃ + O₄) x (O₂ +O₄)] / total observations.

Rating systems for kappa are controversial, as they cannot be proven, but one system classifies kappa values as ²

>0.75: excellent
0.40-0.75: fair to good
<0.40: poor

Kappa can be extrapolated out to 3+ readers using more elaborate equations. Kappa in that setting assesses if all radiologists involved agree on a finding (more stringent).

Kappa is used for categorical values (e.g. larger vs. smaller, has condition vs. does not have the condition). The Bland-Altman analysis is used for continuous variables.