Optimization method AMSGrad in multilayer neural networks

Authors

Abstract

The most common method for optimizing neural networks is the gradient descent method. Gradient descent is an optimization algorithm that tracks the negative gradient of the objective function to find the minimum of the error function.

The limitation of gradient descent is that it applies a single learning rate to all input variables. Extensions of gradient descent, such as the Adaptive Moment Estimation (Adam) algorithm, use different learning rates for each input variable, but this can result in the learning rate quickly decreasing to very small values.

The AMSGrad method is an enhanced version of the Adam method, which aims to improve the convergence properties of the algorithm by avoiding large abrupt changes in the learning rate for each input variable. Technically, gradient descent is called a first-order optimization algorithm because it explicitly uses the first-order derivative of the objective function.

Downloads

Download data is not yet available.

References

Bordes A., Bottou L., Gallinari P. SGD-QN: Careful quasi-Newton stochastic gradient descent. Journal of Machine Learning Research. 2009. Vol. 10. P. 1737-1754. URL: https://www.jmlr.org/papers/volume10/bordes09a/bordes09a.pdf.

Olenych Y. et all. Features of deep study neural network. OPENREVIEWHUB. URL: https://openreviewhub.org/lea/paper-2019/features-deep-study-neural-network#.

Rudenko O., Bodianskyy E. Artificial neural networks. Kharkiv, Ukraine: SMIT Company, 2006. (Ukrainian).

Ruder S. An overview of gradient descent optimization algo-rithms. arXiv preprint arXiv:1609.04747. 2016. URL: https://ruder.io/optimizing-gradient-descent/index.html#adamax.

Subotin S. Neural networks: theory and prac-tice Zhytomyr, Ukraine: Publisher О. О. Evenok, 2020. URL: http://eir.zp.edu.ua/handle/123456789/6800 (Ukrainian).

Sveleba S. et al. Хаотичні стани багатошарової нейронної мережі. Збірник наукових праць" Електроніка та інформаційні технології". 2021. №. 16. C. 20-35. http://dx.doi.org/10.30970/eli.16.3.

Sveleba, S. et all Multilayer neural networks – as determined systems. Computational Problems of Electrical Engineering. 2021. Vol. 11. №2, P. 26–31. https://doi.org/10.23939/jcpee2021.02.026.

Taranenko Yu. Information entropy of chaos. URL: https://habr.com/ru/post/447874/.

Published

2023-06-06

Issue

Section

Information technology and project management

How to Cite

Sveleba, S., & Sveleba, N. (2023). Optimization method AMSGrad in multilayer neural networks. Challenges and Issues of Modern Science, 1, 446-456. https://cims.fti.dp.ua/j/article/view/87

Share