Gradient descent
The gradient descent algorithm is an optimization strategy that aims to minimize an objective cost function (degree of predicting error) of a model in order to produce a model that gives the most accurate predictions. Gradient descent is by far the most commonly used algorithm in machine learning, and is usually the first algorithm most people learn due to its simplicity and usefulness.
As the name suggests, gradient descent can be seen as descending to the lowest point in an ndimensional space. The lowest point represents the lowest value, and hence the minimum value of the cost function. Gradient descent uses the derivative of the function (the gradient of the curve) and changes the parameters of the algorithm in small steps (determined by a learning rate), each time moving towards another point that has a smaller cost. Eventually, gradient descent converges to a point where the gradient is close to 0, the minimum value for the cost function. The parameters that the gradient descent algorithm yielded will form the best model as it produced the minimum error in prediction.
It is an important consideration in designing the training phase of a machine learning algorithm as it iteratively computes the gradient and performs a parameter update in a loop.
Related Radiopaedia articles
Artificial intelligence
 artificial intelligence (AI)
 imaging data sets
 computeraided diagnosis (CAD)
 natural language processing
 machine learning (overview)
 visualizing and understanding neural networks
 common data preparation/preprocessing steps
 DICOM to bitmap conversion
 dimensionality reduction
 scaling
 centering
 normalization
 principal component analysis
 training, testing and validation datasets
 augmentation
 loss function

optimization algorithms
 ADAM
 momentum (Nesterov)
 stochastic gradient descent
 minibatch gradient descent

regularisation
 linear and quadratic
 batch normalization
 ensembling
 rulebased expert systems
 glossary
 activation function
 anomaly detection
 automation bias
 backpropagation
 batch size
 computer vision
 concept drift
 cost function
 confusion matrix
 convolution
 cross validation
 curse of dimensionality
 dice similarity coefficient
 dimensionality reduction
 epoch
 explainable artificial intelligence/XAI
 feature extraction
 federated learning
 gradient descent
 ground truth
 hyperparameters
 image registration
 imputation
 iteration
 jaccard index
 linear algebra
 noise reduction
 normalization
 R (Programming language)
 Python (Programming language)
 segmentation
 semisupervised learning
 synthetic and augmented data
 overfitting
 transfer learning