This website requires JavaScript.

Attentional-Biased Stochastic Gradient Descent

Qi QiYi XuRong JinWotao YinTianbao Yang
Dec 2020
摘要
In this paper, we present a simple yet effective method (ABSGD) foraddressing the data imbalance issue in deep learning. Our method is a simplemodification to momentum SGD where we leverage an attentional mechanism toassign an individual importance weight to each gradient in the mini-batch.Unlike many existing heuristic-driven methods for tackling data imbalance, ourmethod is grounded in {\it theoretically justified distributionally robustoptimization (DRO)}, which is guaranteed to converge to a stationary point ofan information-regularized DRO problem. The individual-level weight of asampled data is systematically proportional to the exponential of a scaled lossvalue of the data, where the scaling factor is interpreted as theregularization parameter in the framework of information-regularized DRO.Compared with existing class-level weighting schemes, our method can capturethe diversity between individual examples within each class. Compared withexisting individual-level weighting methods using meta-learning that requirethree backward propagations for computing mini-batch stochastic gradients, ourmethod is more efficient with only one backward propagation at each iterationas in standard deep learning methods. To balance between the learning offeature extraction layers and the learning of the classifier layer, we employ atwo-stage method that uses SGD for pretraining followed by ABSGD for learning arobust classifier and finetuning lower layers. Our empirical studies on severalbenchmark datasets demonstrate the effectiveness of the proposed method.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答