Liu Yanju, Li Yange, Yi Xinhai, Hu Zuojin, Zhang Huiyu, Liu Yanzhong
School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing, China.
School of Computer and Control Engineering, Qiqihar University, Qiqihar, China.
Front Neurorobot. 2022 Jun 30;16:922761. doi: 10.3389/fnbot.2022.922761. eCollection 2022.
As opposed to macro-expressions, micro-expressions are subtle and not easily detectable emotional expressions, often containing rich information about mental activities. The practical recognition of micro-expressions is essential in interrogation and healthcare. Neural networks are currently one of the most common approaches to micro-expression recognition. Still, neural networks often increase their complexity when improving accuracy, and overly large neural networks require extremely high hardware requirements for running equipment. In recent years, vision transformers based on self-attentive mechanisms have achieved accuracy in image recognition and classification that is no less than that of neural networks. Still, the drawback is that without the image-specific biases inherent to neural networks, the cost of improving accuracy is an exponential increase in the number of parameters. This approach describes training a facial expression feature extractor by transfer learning and then fine-tuning and optimizing the MobileViT model to perform the micro-expression recognition task. First, the CASME II, SAMM, and SMIC datasets are combined into a compound dataset, and macro-expression samples are extracted from the three macro-expression datasets. Each macro-expression sample and micro-expression sample are pre-processed identically to make them similar. Second, the macro-expression samples were used to train the MobileNetV2 block in MobileViT as a facial expression feature extractor and to save the weights when the accuracy was highest. Finally, some of the hyperparameters of the MobileViT model are determined by grid search and then fed into the micro-expression samples for training. The samples are classified using an SVM classifier. In the experiments, the proposed method obtained an accuracy of 84.27%, and the time to process individual samples was only 35.4 ms. Comparative experiments show that the proposed method is comparable to state-of-the-art methods in terms of accuracy while improving recognition efficiency.
与宏观表情相反,微表情是微妙且不易察觉的情绪表达,通常包含有关心理活动的丰富信息。微表情的实际识别在审讯和医疗保健中至关重要。神经网络是目前微表情识别最常用的方法之一。然而,神经网络在提高准确率时往往会增加其复杂性,而且过于庞大的神经网络对运行设备的硬件要求极高。近年来,基于自注意力机制的视觉变换器在图像识别和分类方面取得了不低于神经网络的准确率。但缺点是,没有神经网络固有的特定于图像的偏差,提高准确率的代价是参数数量呈指数级增加。该方法描述了通过迁移学习训练面部表情特征提取器,然后对MobileViT模型进行微调与优化以执行微表情识别任务。首先,将CASME II、SAMM和SMIC数据集合并为一个复合数据集,并从这三个宏观表情数据集中提取宏观表情样本。对每个宏观表情样本和微表情样本进行相同的预处理,使其相似。其次,使用宏观表情样本训练MobileViT中的MobileNetV2模块作为面部表情特征提取器,并在准确率最高时保存权重。最后,通过网格搜索确定MobileViT模型的一些超参数,然后将其输入微表情样本进行训练。使用支持向量机分类器对样本进行分类。在实验中,所提方法的准确率达到84.27%,处理单个样本的时间仅为35.4毫秒。对比实验表明,所提方法在准确率方面与现有最先进方法相当,同时提高了识别效率。