Stochastic gradient descent

Last revised by Andrew Murphy on 23 Jul 2019

Citation, DOI, disclosures and article data

Citation:

Wang D, Murphy A, Stochastic gradient descent. Reference article, Radiopaedia.org (Accessed on 23 Apr 2024) https://doi.org/10.53347/rID-61715

DOI:

https://doi.org/10.53347/rID-61715

Permalink:

https://radiopaedia.org/articles/61715?iframe=true

rID:

61715

Article created:

15 Jul 2018, David John Wang

Disclosures:

At the time the article was created David John Wang had no recorded disclosures.

View David John Wang's current disclosures

Last revised:

23 Jul 2019, Andrew Murphy ◉

Disclosures:

At the time the article was last revised Andrew Murphy had no recorded disclosures.

View Andrew Murphy's current disclosures

Revisions:

3 times, by 2 contributors - see full revision history and disclosures

Tags:

Stochastic gradient descent is an optimization algorithm which improves the efficiency of the gradient descent algorithm. Similar to batch gradient descent, stochastic gradient descent performs a series of steps to minimize a cost function. Unlike batch gradient descent, which is computationally expensive to run on large data sets, stochastic gradient descent is able to take smaller steps to be more efficient while achieving the same result.

After randomization of the data set, stochastic gradient descent performs gradient descent based on one example, and start to change the cost function. This takes less computational power compared to the batch gradient descent, which iterates through all the examples in a data set before aiming to reduce the cost function.

However in stochastic gradient descent, as one example is processed per iteration, thus there is no guarantee that the cost function reduces with every step. Over time, the general direction of the stochastic gradient descent will converge to close to the minimum. Stochastic gradient descent does not always converge to the minimum of the cost function, instead, it will continuously circulate the minimum. With large data sets with millions of examples, and after a reasonable amount of iterations, the value of the cost function will be extremely close to the minimum that any differences will be negligible.

Stochastic gradient descent

Citation, DOI, disclosures and article data

References

Incoming Links

Related articles: Artificial intelligence

Promoted articles (advertising)