Stochastic gradient descent
Stochastic gradient descent is an optimization algorithm which improves the efficiency of the gradient descent algorithm. Similar to batch gradient descent, stochastic gradient descent performs a series of steps to minimize a cost function. Unlike batch gradient descent, which is computationally expensive to run on large data sets, stochastic gradient descent is able to take smaller steps to be more efficient while achieving the same result.
After randomization of the data set, stochastic gradient descent performs gradient descent based on one example, and start to change the cost function. This takes less computational power compared to the batch gradient descent, which iterates through all the examples in a data set before aiming to reduce the cost function.
However in stochastic gradient descent, as one example is processed per iteration, thus there is no guarantee that the cost function reduces with every step. Over time, the general direction of the stochastic gradient descent will converge to close to the minimum. Stochastic gradient descent does not always converge to the minimum of the cost function, instead, it will continuously circulate the minimum. With large data sets with millions of examples, and after a reasonable amount of iterations, the value of the cost function will be extremely close to the minimum that any differences will be negligible.
Related Radiopaedia articles
Artificial intelligence
 artificial intelligence (AI)
 imaging data sets
 computeraided diagnosis (CAD)
 natural language processing
 machine learning (overview)
 visualizing and understanding neural networks
 common data preparation/preprocessing steps
 DICOM to bitmap conversion
 dimensionality reduction
 scaling
 centering
 normalization
 principal component analysis
 training, testing and validation datasets
 augmentation
 loss function

optimization algorithms
 ADAM
 momentum (Nesterov)
 stochastic gradient descent
 minibatch gradient descent

regularisation
 linear and quadratic
 batch normalization
 ensembling
 rulebased expert systems
 glossary
 activation function
 anomaly detection
 automation bias
 backpropagation
 batch size
 computer vision
 concept drift
 cost function
 confusion matrix
 convolution
 cross validation
 curse of dimensionality
 dice similarity coefficient
 dimensionality reduction
 epoch
 explainable artificial intelligence/XAI
 feature extraction
 federated learning
 gradient descent
 ground truth
 hyperparameters
 image registration
 imputation
 iteration
 jaccard index
 linear algebra
 noise reduction
 normalization
 R (Programming language)
 Python (Programming language)
 segmentation
 semisupervised learning
 synthetic and augmented data
 overfitting
 transfer learning