Визначення віку людини за фото на основі нейронних мереж

Євгеній Вербенко; Ольга Мацуга

Authors

Yevhenii Verbenko Oles Honchar Dnipro National University
https://orcid.org/0009-0001-8438-4990
Olga Matsuga Oles Honchar Dnipro National University
https://orcid.org/0000-0001-6444-8566

Keywords:

age estimation, neural networks, regression, UTKFaces

Abstract

The aim of this work was to compare different neural network architectures for the task of age estimation from face images. Since age is a continuous variable, the task of determining a human age from images of their face is treated as a regression problem. The UTKFaces dataset was used in this work. This dataset contains 24,000 annotated images categorized by gender, race, and age. To solve the task, four architectures were chosen for training: AlexNet, VGG-19, ResNet-50, and Inception-v4. These convolutional neural network architectures have shown significant advancements in image classification on the ImageNet dataset. AlexNet introduced the use of ReLU activation, dropout, and max-pooling, while VGG-19 emphasized deeper architectures with small filters. ResNet-50 addressed the vanishing gradient problem with residual connections, and Inception-v4 improved efficiency and gradient flow with optimized blocks and residual connections. In all networks, the last layer was replaced with a fully connected layer with one neuron and a linear activation function. The mean squared error (MSE) was used as the loss function during training, and the mean absolute error (MAE) was used as the quality metric. The data was split into training and testing sets in a 90% to 10% ratio. Before training, the images were normalized and resized to fit each neural network's requirements. AlexNet and VGG-19 were trained using the SGD optimizer with a learning rate of 0.2, ResNet-50 was trained using the Adam optimizer with a learning rate of 0.02, and Inception-v4 was trained using the Adadelta optimizer with a learning rate of 0.02. These methods and their parameters were chosen as the best after computational experiments. Each network was trained for a different number of epochs, as needed for convergence. After training, VGG-19 and ResNet-50 achieved MAE values of 2.7 and 3.5, respectively, while Inception-v4 had an MAE of 3.87. AlexNet exhibited significant overfitting. ResNet-50 processed images the fastest.

Downloads

Download data is not yet available.

References

UTKFace. Kaggle. https://www.kaggle.com/datasets/jangedoo/utkface-new/data

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations (ICLR 2015), 1–14. https://doi.org/10.48550/arXiv.1409.1556

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Comput-er Vision and Pattern Recognition, 770–778. https://doi.org/10.1109/CVPR.2016.90

Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). In-ception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 31(1), 4278–4284. https://doi.org/10.1609/aaai.v31i1.11231