Artificial neural networks are a powerful type of model capable of processing many types of data. Initially inspired by the connections between biological neural networks, modern artificial neural networks only bear slight resemblances at a high level to their biological counterparts. Nonetheless, the analogy remains conceptually useful and is reflected in some of the terminology used. Individual 'neurons' in the network receive variably-weighted input from numerous other neurons in the more superficial layers. Activation of any single neuron depends on the cumulative input of these more superficial neurons. They, in turn, connect to many deeper neurons, again with variable weightings.
There are two broad types of neural networks:
- fully connected networks
- simple kind of neural network where every neuron on one layer is connected to every neuron on the next layer
- recurrent neural networks
- neural network where part or all of the output from its previous step is used as input for its current step. This is very useful for working with a series of connected information, for example, videos.
The usefulness of neural networks stems from the fact that they are universal function approximators, meaning that given the appropriate parameters, they can represent a wide variety of interesting and dissimilar functions.
Furthermore, they are differentiable mathematical functions, that is for a given set of parameters, inputs and labels, one can find the gradient of the parameters concerning defined loss functions, in effect helping determine how these parameters should be altered in order to improve predictions.
Artificial neural networks can be broadly divided into feedforward or recurrent neural architectures (discussed separately).
Feedforward neural networks are more readily conceptualised in 'layers'. The first layer of the neural network is merely the inputs of each sample, and each neuron in each successive layer is connected to a set of neurons in the preceding layer.
To compute the function represented by the network, we calculate the activation in each neuron by applying a non-linear activation function (typically a sigmoid function) to the weighted sum of the activations of the connected neurons in the preceding layer. These weights represent the information stored by the neural network and are the parameters that we update during training. The activations of the final layer are the output of the network.
The different choices of how we connect neurons in successive layers to the previous layers strongly influence the abilities of the network and consists of what we normally refer to as the 'architecture' of the network. Common architectures are fully connected neural network and convolutional neural networks.
Related Radiopaedia articles
artificial intelligence (AI)
- computer aided diagnosis (CAD)
machine learning (overview)
- types of machine learning
- common data preparation/preprocessing steps
- DICOM to bitmap
- principal component analysis
- train/test/validation split
- loss functions
- mean squared error
- cross entropy
- optimization algorithms
- stochastic gradient descent
- momentum (Nesterov)
- linear and quadratic
- batch normalization
- natural language processing
- rule-based expert systems