通过光场相机增强面部表情识别。

Enhancing Facial Expression Recognition through Light Field Cameras.

机构信息

Institut de Mathématiques de Marseille (IMM), CNRS, Aix-Marseille University, 13009 Marseille, France.

Laboratoire d'Informatique et des Systèmes (LIS), CNRS, Aix-Marseille University, 13009 Marseille, France.

出版信息

Sensors (Basel). 2024 Sep 3;24(17):5724. doi: 10.3390/s24175724.

DOI:10.3390/s24175724

PMID:39275635

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11398184/

Abstract

In this paper, we study facial expression recognition (FER) using three modalities obtained from a light field camera: sub-aperture (SA), depth map, and all-in-focus (AiF) images. Our objective is to construct a more comprehensive and effective FER system by investigating multimodal fusion strategies. For this purpose, we employ EfficientNetV2-S, pre-trained on AffectNet, as our primary convolutional neural network. This model, combined with a BiGRU, is used to process SA images. We evaluate various fusion techniques at both decision and feature levels to assess their effectiveness in enhancing FER accuracy. Our findings show that the model using SA images surpasses state-of-the-art performance, achieving 88.13% ± 7.42% accuracy under the subject-specific evaluation protocol and 91.88% ± 3.25% under the subject-independent evaluation protocol. These results highlight our model's potential in enhancing FER accuracy and robustness, outperforming existing methods. Furthermore, our multimodal fusion approach, integrating SA, AiF, and depth images, demonstrates substantial improvements over unimodal models. The decision-level fusion strategy, particularly using average weights, proved most effective, achieving 90.13% ± 4.95% accuracy under the subject-specific evaluation protocol and 93.33% ± 4.92% under the subject-independent evaluation protocol. This approach leverages the complementary strengths of each modality, resulting in a more comprehensive and accurate FER system.

摘要

在本文中，我们使用从光场相机获得的三种模式来研究面部表情识别 (FER)：子孔径 (SA)、深度图和全聚焦 (AiF) 图像。我们的目标是通过研究多模态融合策略来构建一个更全面和有效的 FER 系统。为此，我们采用在 AffectNet 上预训练的 EfficientNetV2-S 作为我们的主要卷积神经网络。该模型与 BiGRU 结合用于处理 SA 图像。我们在决策和特征级别评估各种融合技术，以评估它们在提高 FER 准确性方面的有效性。我们的研究结果表明，使用 SA 图像的模型超越了最先进的性能，在特定于主体的评估协议下达到 88.13%±7.42%的准确率，在独立于主体的评估协议下达到 91.88%±3.25%的准确率。这些结果突出了我们的模型在提高 FER 准确性和鲁棒性方面的潜力，优于现有方法。此外，我们的多模态融合方法，集成了 SA、AiF 和深度图像，与单模态模型相比，表现出显著的改进。决策级融合策略，特别是使用平均权重，在特定于主体的评估协议下达到 90.13%±4.95%的准确率，在独立于主体的评估协议下达到 93.33%±4.92%的准确率，证明是最有效的。这种方法利用了每个模态的互补优势，从而实现了一个更全面和准确的 FER 系统。