Optimization method AMSGrad in multilayer neural networks
Abstract
The most common method for optimizing neural networks is the gradient descent method. Gradient descent is an optimization algorithm that tracks the negative gradient of the objective function to find the minimum of the error function.
The limitation of gradient descent is that it applies a single learning rate to all input variables. Extensions of gradient descent, such as the Adaptive Moment Estimation (Adam) algorithm, use different learning rates for each input variable, but this can result in the learning rate quickly decreasing to very small values.
The AMSGrad method is an enhanced version of the Adam method, which aims to improve the convergence properties of the algorithm by avoiding large abrupt changes in the learning rate for each input variable. Technically, gradient descent is called a first-order optimization algorithm because it explicitly uses the first-order derivative of the objective function.
Downloads
References
Bordes A., Bottou L., Gallinari P. SGD-QN: Careful quasi-Newton stochastic gradient descent. Journal of Machine Learning Research. 2009. Vol. 10. P. 1737-1754. URL: https://www.jmlr.org/papers/volume10/bordes09a/bordes09a.pdf.
Olenych Y. et all. Features of deep study neural network. OPENREVIEWHUB. URL: https://openreviewhub.org/lea/paper-2019/features-deep-study-neural-network#.
Rudenko O., Bodianskyy E. Artificial neural networks. Kharkiv, Ukraine: SMIT Company, 2006. (Ukrainian).
Ruder S. An overview of gradient descent optimization algo-rithms. arXiv preprint arXiv:1609.04747. 2016. URL: https://ruder.io/optimizing-gradient-descent/index.html#adamax.
Subotin S. Neural networks: theory and prac-tice Zhytomyr, Ukraine: Publisher О. О. Evenok, 2020. URL: http://eir.zp.edu.ua/handle/123456789/6800 (Ukrainian).
Sveleba S. et al. Хаотичні стани багатошарової нейронної мережі. Збірник наукових праць" Електроніка та інформаційні технології". 2021. №. 16. C. 20-35. http://dx.doi.org/10.30970/eli.16.3.
Sveleba, S. et all Multilayer neural networks – as determined systems. Computational Problems of Electrical Engineering. 2021. Vol. 11. №2, P. 26–31. https://doi.org/10.23939/jcpee2021.02.026.
Taranenko Yu. Information entropy of chaos. URL: https://habr.com/ru/post/447874/.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Challenges and Issues of Modern Science
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in the journal Challenges and Issues of Modern Science are licensed under the Creative Commons Attribution 4.0 International (CC BY) license. This means that you are free to:
- Share, copy, and redistribute the article in any medium or format
- Adapt, remix, transform, and build upon the article
as long as you provide appropriate credit to the original work, include the authors' names, article title, journal name, and indicate that the work is licensed under CC BY. Any use of the material should not imply endorsement by the authors or the journal.