This website requires JavaScript.

Convergence to good non-optimal critical points in the training of neural networks: Gradient descent optimization with one random initialization overcomes all bad non-global local minima with high probability

Shokhrukh IbragimovArnulf JentzenAdrian Riekert
Dec 2022
摘要
Gradient descent (GD) methods for the training of artificial neural networks(ANNs) belong nowadays to the most heavily employed computational schemes inthe digital world. Despite the compelling success of such methods, it remainsan open problem to provide a rigorous theoretical justification for the successof GD methods in the training of ANNs. The main difficulty is that theoptimization risk landscapes associated to ANNs usually admit many non-optimalcritical points (saddle points as well as non-global local minima) whose riskvalues are strictly larger than the optimal risk value. It is a keycontribution of this article to overcome this obstacle in certain simplifiedshallow ANN training situations. In such simplified ANN training scenarios weprove that the gradient flow (GF) dynamics with only one random initializationovercomes with high probability all bad non-global local minima (all non-globallocal minima whose risk values are much larger than the risk value of theglobal minima) and converges with high probability to a good critical point (acritical point whose risk value is very close to the optimal risk value of theglobal minima). This analysis allows us to establish convergence in probabilityto zero of the risk value of the GF trajectories with convergence rates as theANN training time and the width of the ANN increase to infinity. We complementthe analytical findings of this work with extensive numerical simulations forshallow and deep ANNs: All these numerical simulations strongly suggest thatwith high probability the considered GD method (stochastic GD or Adam)overcomes all bad non-global local minima, does not converge to a globalminimum, but does converge to a good non-optimal critical point whose riskvalue is very close to the optimal risk value.
展开全部
图表提取

暂无人提供速读十问回答

论文十问由沈向洋博士提出,鼓励大家带着这十个问题去阅读论文,用有用的信息构建认知模型。写出自己的十问回答,还有机会在当前页面展示哦。

Q1论文试图解决什么问题?
Q2这是否是一个新的问题?
Q3这篇文章要验证一个什么科学假设?
0
被引用
笔记
问答