Adam
0 订阅
Adam is an adaptive learning rate optimization algorithm that utilises both momentum and scaling, combining the benefits of RMSProp and SGD w/th Momentum. The optimizer is designed to be appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The weight updates are performed as:$$ w_{t} = w_{t-1} - \eta\frac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \epsilon} $$with$$ \hat{m}_{t} = \frac{m_{t}}{1-\beta^{t}_{1}} $$$$ \hat{v}_{t} = \frac{v_{t}}{1-\beta^{t}_{2}} $$$$ m_{t} = \beta_{1}m_{t-1} + (1-\beta_{1})g_{t} $$$$ v_{t} = \beta_{2}v_{t-1} + (1-\beta_{2})g_{t}^{2} $$$ \eta $ is the step size/learning rate, around 1e-3 in the original paper. $ \epsilon $ is a small number, typically 1e-8 or 1e-10, to prevent dividing by zero. $ \beta_{1} $ and $ \beta_{2} $ are forgetting parameters, with typical values 0.9 and 0.999, respectively.
相关学科: SGDRMSPropAdaGradSMITHAdaDeltaReLUAMSGradStochastic OptimizationSoftmaxCAM
学科讨论

暂无讨论内容,你可以
推荐文献
按被引用数
学科管理组
暂无学科课代表,你可以申请成为课代表
重要学者
Geoffrey E. Hinton
345738 被引用,408
篇论文
Yi Chen
267689 被引用,4684
篇论文
David Haussler
210533 被引用,548
篇论文
John P. A. Ioannidis
201477 被引用,1398
篇论文
Julian P T Higgins
178638 被引用,355
篇论文
Hyun-Chul Kim
172231 被引用,4513
篇论文
Mark Gerstein
132098 被引用,800
篇论文
Mark Raymond Adams
123535 被引用,1461
篇论文
Jiawei Han
121361 被引用,1269
篇论文
David W. Bates
114038 被引用,1400
篇论文