This website requires JavaScript.

Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution

Antonio OrvietoSimon Lacoste-JulienNicolas Loizou
May 2022
摘要
Recently Loizou et al. (2021), proposed and analyzed stochastic gradientdescent (SGD) with stochastic Polyak stepsize (SPS). The proposed SPS comeswith strong convergence guarantees and competitive performance; however, it hastwo main drawbacks when it is used in non-over-parameterized regimes: (i) Itrequires a priori knowledge of the optimal mini-batch losses, which are notavailable when the interpolation condition is not satisfied (e.g., regularizedlosses), and (ii) it guarantees convergence only to a neighborhood of thesolution. In this work, we study the dynamics and the convergence properties ofSGD equipped with new variants of the stochastic Polyak stepsize and providesolutions to both drawbacks of the original SPS. We first show that a simplemodification of the original SPS that uses lower bounds instead of the optimalfunction values can directly solve the issue (i). On the other hand, solvingissue (ii) turns out to be more challenging and leads us to valuable insightsinto the method's behavior. We show that if interpolation is not satisfied, thecorrelation between SPS and stochastic gradients introduces a bias. This biaseffectively distorts the expectation of the gradient signal near minimizers,leading to non-convergence - even if the stepsize is scaled down duringtraining. This phenomenon is in direct contrast to the behavior of SGD, whereclassical results guarantee convergence under simple stepsize annealing. To fixthis issue, we propose DecSPS, a novel modification of SPS, which guaranteesconvergence to the exact minimizer - without a priori knowledge of the problemparameters. We show that the new variant of SPS works well both in smooth andnon-smooth settings.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答