Natural language processing

Last revised by Andrew Murphy on 6 Jun 2023

Natural language processing (NLP) is an area of active research in artificial intelligence concerned with human languages. Natural language processing programs use human written text or human speech as data for analysis. The goals of natural language processing programs can vary from generating insights from texts or recorded speech to generating text or speech.

The first area of natural language processing to gain wide usage in radiology was speech recognition. In earlier literature, speech recognition was often referred to as voice recognition 1-3, but the trend in nomenclature is towards differentiating voice recognition and speech recognition, with only the latter implying the use of dictated recordings to create reports. In many radiology practices, radiologists use speech recognition programs to create reports routinely.

Increasing research in artificial neural networks has sparked an interest in topic modeling algorithms of natural language processing which can be used to automate the labeling of images. Examples include the NIH chest x-ray data set ChestX-ray8 3

Due to the brevity, limited vocabulary, and structured nature of radiology reports, many different algorithm types have proven successful at annotation of radiology reports.

Areas of active research for the application of natural language processing in radiology include areas of natural language understanding (NLU) such as topic modelling, other forms of information extraction and keyword searching. Natural language processing also includes natural language generation (NLG).

Large language models

Traditionally natural language processing utilized recurrent neural networks, until around 2017 when researchers from Google Bain published a paper exploring the use of transformers 6. Transformers improve the efficiency of the algorithm via its ability to focus on different parts of the input sequence while encoding or decoding it.

Generative pretrained transformer models such as ChatGPT have made a significant impact on natural language processing due to its efficiency and ability to replicate human writing 7.

Practical Points

Several organizations have undertaken efforts to standardize radiology reports 5. One byproduct of standardized reports is that the reports are more amenable to rule based and/or decision tree algorithms for NLP, however at present much progress has been made in interpreting free text by using algorithms that use statistical operations on matrices derived from texts such as the Latent Dirichlet allocation.

ADVERTISEMENT: Supporters see fewer/no ads