Events / Convergence of Adaptive Stochastic Gradient Descent – Yuege Xie

Convergence of Adaptive Stochastic Gradient Descent – Yuege Xie

26th November 2020
15:00 - 16:00


COMPASS Seminar Series


We are thrilled to announce that on Thursday the 26th of November at 3pm UK time Yuege Xie (University of Texas, Austin) will give a talk about her recent paper Linear Convergence of Adaptive Stochastic Gradient Descent. She will be introducing some background on adaptive gradient descent algorithms and then she will introduce the key ideas behind her paper, you can read more about it in the synopsis below.


We prove that the norm version of the adaptive stochastic gradient method (AdaGrad-Norm) achieves a linear convergence rate for a subset of either strongly convex functions or non-convex functions that satisfy the Polyak-Lojasiewicz (PL) inequality. The paper introduces the notion of Restricted Uniform Inequality of Gradients (RUIG)—which is a measure of the balanced-ness of the stochastic gradient norms—to depict the landscape of a function. RUIG plays a key role in proving the robustness of AdaGrad-Norm to its hyper-parameter tuning in the stochastic setting. On top of RUIG, we develop a two-stage framework to prove the linear convergence of AdaGrad-Norm without knowing the parameters of the objective functions. This framework can likely be extended to other adaptive stepsize algorithms. The numerical experiments validate the theory and suggest future directions for improvement. I will also talk about some background of adaptive gradient descent algorithms and possible extensions. This is joint work with Xiaoxia Wu and Rachel Ward.


You can join the seminar by clicking on the following ZOOM link or typing 8450280967 in the ZOOM client. To hear more about the COMPASS Seminar Series and to make sure you stay up to date with our events, join the mailing list!

Alessio Zakaria has very kindly offered to host a prep session on Wednesday the 18th of November at 1pm UK time. He will walk us through convergence results for deterministic and stochastic gradient descent and the assumptions focusing on. To attend the prep session, just use the same link provided above!