Stochastic gradient descent
Stochastic gradient descent is an optimization algorithm which improves the efficiency of the gradient descent algorithm. Similar to batch gradient descent, stochastic gradient descent performs a series of steps to minimize a cost function. Unlike batch gradient descent, which is computationally expensive to run on large data sets, stochastic gradient descent is able to take smaller steps to be more efficient while achieving the same result.
After randomization of the data set, stochastic gradient descent performs gradient descent based on one example, and start to change the cost function. This takes less computational power compared to the batch gradient descent, which iterates through all the examples in a data set before aiming to reduce the cost function.
However in stochastic gradient descent, as one example is processed per iteration, thus there is no guarantee that the cost function reduces with every step. Over time, the general direction of the stochastic gradient descent will converge to close to the minimum. Stochastic gradient descent does not always converge to the minimum of the cost function, instead, it will continuously circulate the minimum. With large data sets with millions of examples, and after a reasonable amount of iterations, the value of the cost function will be extremely close to the minimum that any differences will be negligible.
Related Radiopaedia articles
Artificial intelligence
- artificial intelligence (AI)
- imaging data sets
- computer-aided diagnosis (CAD)
- natural language processing
- machine learning (overview)
- visualizing and understanding neural networks
- common data preparation/preprocessing steps
- DICOM to bitmap conversion
- dimensionality reduction
- scaling
- centering
- normalization
- principal component analysis
- training, testing and validation datasets
- augmentation
- loss function
-
optimization algorithms
- ADAM
- momentum (Nesterov)
- stochastic gradient descent
- mini-batch gradient descent
-
regularisation
- linear and quadratic
- batch normalization
- ensembling
- rule-based expert systems
- glossary
- activation function
- anomaly detection
- automation bias
- backpropagation
- batch size
- computer vision
- concept drift
- cost function
- confusion matrix
- convolution
- cross validation
- curse of dimensionality
- dice similarity coefficient
- dimensionality reduction
- epoch
- explainable artificial intelligence/XAI
- feature extraction
- federated learning
- gradient descent
- ground truth
- hyperparameters
- image registration
- imputation
- iteration
- jaccard index
- linear algebra
- noise reduction
- normalization
- R (Programming language)
- Python (Programming language)
- segmentation
- semi-supervised learning
- synthetic and augmented data
- overfitting
- transfer learning