Stochastic gradient descent

Last revised by Andrew Murphy on 23 Jul 2019

Stochastic gradient descent is an optimization algorithm which improves the efficiency of the gradient descent algorithm. Similar to batch gradient descent, stochastic gradient descent performs a series of steps to minimize a cost function. Unlike batch gradient descent, which is computationally expensive to run on large data sets, stochastic gradient descent is able to take smaller steps to be more efficient while achieving the same result.

After randomization of the data set, stochastic gradient descent performs gradient descent based on one example, and start to change the cost function. This takes less computational power compared to the batch gradient descent, which iterates through all the examples in a data set before aiming to reduce the cost function.

However in stochastic gradient descent, as one example is processed per iteration, thus there is no guarantee that the cost function reduces with every step. Over time, the general direction of the stochastic gradient descent will converge to close to the minimum. Stochastic gradient descent does not always converge to the minimum of the cost function, instead, it will continuously circulate the minimum. With large data sets with millions of examples, and after a reasonable amount of iterations, the value of the cost function will be extremely close to the minimum that any differences will be negligible.

ADVERTISEMENT: Supporters see fewer/no ads