Student Perspectives: An Introduction to Stochastic Gradient Methods

A post by Ettore Fincato, PhD student on the Compass programme.

This post provides an introduction to Gradient Methods in Stochastic Optimisation. This class of algorithms is the foundation of my current research work with Prof. Christophe Andrieu and Dr. Mathieu Gerber, and finds applications in a great variety of topics, such as regression estimation, support vector machines, convolutional neural networks.

We can see below a simulation by Emilien Dupont (https://emiliendupont.github.io/) which represents two trajectories of an optimisation process of a time-varying function. This well describes the main idea behind the algorithms we will be looking at, that is, using the (stochastic) gradient of a (random) function to iteratively reach the optimum.

Stochastic Optimisation

Stochastic optimisation was introduced by [1], and its aim is to find a scheme for solving equations of the form \nabla_w g(w)=0 given “noisy” measurements of g [2].

In the simplest deterministic framework, one can fully determine the analytical form of g(w), knows that it is differentiable and admits an unique minimum – hence the problem

w_*=\underset{w}{\text{argmin}}\quad g(w)

is well defined and solved by \nabla_w g(w)=0.

On the other hand, one may not be able to fully determine g(w) because his experiment is corrupted by a random noise. In such cases, it is common to identify this noise with a random variable, say V, consider an unbiased estimator \eta(w,V) s.t. \mathbb{E}_V[\eta(w,V)]=g(w) and to rewrite the problem as

w_*=\underset{w}{\text{argmin}}\quad\mathbb{E}_V[\eta(w,V)].

(more…)

Skip to toolbar