This website requires JavaScript.

# Taming Fat-Tailed ("Heavier-Tailed'' with Potentially Infinite Variance) Noise in Federated Learning

Oct 2022

A key assumption in most existing works on FL algorithms' convergenceanalysis is that the noise in stochastic first-order information has a finitevariance. Although this assumption covers all light-tailed (i.e.,sub-exponential) and some heavy-tailed noise distributions (e.g., log-normal,Weibull, and some Pareto distributions), it fails for many fat-tailed noisedistributions (i.e., heavier-tailed'' with potentially infinite variance)that have been empirically observed in the FL literature. To date, it remainsunclear whether one can design convergent algorithms for FL systems thatexperience fat-tailed noise. This motivates us to fill this gap in this paperby proposing an algorithmic framework called FAT-Clipping (\ul{f}ederated\ul{a}veraging with \ul{t}wo-sided learning rates and \ul{clipping}), whichcontains two variants: FAT-Clipping per-round (FAT-Clipping-PR) andFAT-Clipping per-iteration (FAT-Clipping-PI). Specifically, for the largest$\alpha \in (1,2]$ such that the fat-tailed noise in FL still has a bounded$\alpha$-moment, we show that both variants achieve$\mathcal{O}((mT)^{\frac{2-\alpha}{\alpha}})$ and$\mathcal{O}((mT)^{\frac{1-\alpha}{3\alpha-2}})$ convergence rates in thestrongly-convex and general non-convex settings, respectively, where $m$ and$T$ are the numbers of clients and communication rounds. Moreover, at theexpense of more clipping operations compared to FAT-Clipping-PR,FAT-Clipping-PI further enjoys a linear speedup effect with respect to thenumber of local updates at each client and being lower-bound-matching (i.e.,order-optimal). Collectively, our results advance the understanding ofdesigning efficient algorithms for FL systems that exhibit fat-tailedfirst-order oracle information.

Q1论文试图解决什么问题？
Q2这是否是一个新的问题？
Q3这篇文章要验证一个什么科学假设？
0