Gan Junying, Xiong Junling
School of Electronics and Information Engineering, Wuyi University, Jiangmen, 529020, Guangdong, China.
Sci Rep. 2025 Jan 22;15(1):2784. doi: 10.1038/s41598-025-86831-0.
Facial beauty prediction (FBP) is a leading area of research in artificial intelligence. Currently, there is a small amount of labeled data and a large amount of unlabeled data in the FBP database. The features extracted by the model based on supervised training are limited, resulting in low prediction accuracy. Masked autoencoder (MAE) is a self-supervised learning method that outperforms supervised learning methods without relying on large-scale databases. The MAE can improve the feature extraction ability of the model effectively. The multi-scale convolution strategy can expand the receptive field and combine the attention mechanism of the MAE to capture the dependency between distant pixels and acquire shallow and deep image features. Knowledge distillation can take the abundant knowledge from the teacher net to the student net, reduce the number of parameters, and compress the model. In this paper, the MAE of the multi-scale convolution strategy is combined with knowledge distillation for FBP. First, the MAE model with a multi-scale convolution strategy is constructed and used in the teacher net for pretraining. Second, the MAE model is constructed for the student net. Finally, the teacher net performs knowledge distillation, and the student net receives the loss function transmitted from the teacher net for optimization. The experimental results show that the proposed method outperforms other methods on the FBP task, improves FBP accuracy, and can be widely applied in tasks such as image classification.
面部美预测(FBP)是人工智能领域的一个前沿研究方向。目前,FBP数据库中存在少量标注数据和大量未标注数据。基于监督训练的模型所提取的特征有限,导致预测准确率较低。掩码自动编码器(MAE)是一种自监督学习方法,在不依赖大规模数据库的情况下优于监督学习方法。MAE能够有效提高模型的特征提取能力。多尺度卷积策略可以扩大感受野,并结合MAE的注意力机制来捕捉远距离像素之间的依赖关系,获取浅层和深层图像特征。知识蒸馏可以将教师网络中的丰富知识传递给学生网络,减少参数数量并压缩模型。本文将多尺度卷积策略的MAE与知识蒸馏相结合用于FBP。首先,构建具有多尺度卷积策略的MAE模型并用于教师网络进行预训练。其次,为学生网络构建MAE模型。最后,教师网络进行知识蒸馏,学生网络接收从教师网络传递来的损失函数进行优化。实验结果表明,所提方法在FBP任务上优于其他方法,提高了FBP准确率,并且可以广泛应用于图像分类等任务。