Adam

Adam is an adaptive learning rate optimization algorithm that utilises both momentum and scaling, combining the benefits of RMSProp and SGD w/th Momentum. The optimizer is designed to be appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The weight updates are performed as:$$ w_{t} = w_{t-1} - \eta\frac{\hat{m}_{t}}{\sqrt{\hat{v}_{t}} + \epsilon} $$with$$ \hat{m}_{t} = \frac{m_{t}}{1-\beta^{t}_{1}} $$$$ \hat{v}_{t} = \frac{v_{t}}{1-\beta^{t}_{2}} $$$$ m_{t} = \beta_{1}m_{t-1} + (1-\beta_{1})g_{t} $$$$ v_{t} = \beta_{2}v_{t-1} + (1-\beta_{2})g_{t}^{2} $$$ \eta $ is the step size/learning rate, around 1e-3 in the original paper. $ \epsilon $ is a small number, typically 1e-8 or 1e-10, to prevent dividing by zero. $ \beta_{1} $ and $ \beta_{2} $ are forgetting parameters, with typical values 0.9 and 0.999, respectively.
相关学科: SGDRMSPropAdaGradSMITHAdaDeltaReLUAMSGradStochastic OptimizationSoftmaxCAM

学科讨论

讨论Icon

暂无讨论内容,你可以

推荐文献

按被引用数

学科管理组

暂无学科课代表,你可以申请成为课代表

重要学者

Geoffrey E. Hinton

345738 被引用,408 篇论文

Yi Chen

267689 被引用,4684 篇论文

David Haussler

210533 被引用,548 篇论文

John P. A. Ioannidis

201477 被引用,1398 篇论文

Julian P T Higgins

178638 被引用,355 篇论文

Hyun-Chul Kim

172231 被引用,4513 篇论文

Mark Gerstein

132098 被引用,800 篇论文

Mark Raymond Adams

123535 被引用,1461 篇论文

Jiawei Han

121361 被引用,1269 篇论文

David W. Bates

114038 被引用,1400 篇论文