Suppr超能文献

VT-3DCapsNet:基于视频的面部表情识别的视觉时态 3D 胶囊网络。

VT-3DCapsNet: Visual tempos 3D-Capsule network for video-based facial expression recognition.

机构信息

College of Information Engineering, Shanghai Maritime University, Shanghai, China.

National Engineering Research Center of Ship Transportation Control Systems, Shanghai Ship and Shipping Research Institute, Shanghai, China.

出版信息

PLoS One. 2024 Aug 23;19(8):e0307446. doi: 10.1371/journal.pone.0307446. eCollection 2024.

Abstract

Facial expression recognition(FER) is a hot topic in computer vision, especially as deep learning based methods are gaining traction in this field. However, traditional convolutional neural networks (CNN) ignore the relative position relationship of key facial features (mouth, eyebrows, eyes, etc.) due to changes of facial expressions in real-world environments such as rotation, displacement or partial occlusion. In addition, most of the works in the literature do not take visual tempos into account when recognizing facial expressions that possess higher similarities. To address these issues, we propose a visual tempos 3D-CapsNet framework(VT-3DCapsNet). First, we propose 3D-CapsNet model for emotion recognition, in which we introduced improved 3D-ResNet architecture that integrated with AU-perceived attention module to enhance the ability of feature representation of capsule network, through expressing deeper hierarchical spatiotemporal features and extracting latent information (position, size, orientation) in key facial areas. Furthermore, we propose the temporal pyramid network(TPN)-based expression recognition module(TPN-ERM), which can learn high-level facial motion features from video frames to model differences in visual tempos, further improving the recognition accuracy of 3D-CapsNet. Extensive experiments are conducted on extended Kohn-Kanada (CK+) database and Acted Facial Expression in Wild (AFEW) database. The results demonstrate competitive performance of our approach compared with other state-of-the-art methods.

摘要

面部表情识别(FER)是计算机视觉领域的一个热门话题,特别是随着基于深度学习的方法在该领域的应用日益广泛。然而,传统的卷积神经网络(CNN)由于真实环境中面部表情的变化,如旋转、位移或部分遮挡,忽略了关键面部特征(嘴、眉毛、眼睛等)的相对位置关系。此外,文献中的大多数工作在识别具有更高相似度的面部表情时都没有考虑视觉节奏。为了解决这些问题,我们提出了一种视觉节奏 3D-CapsNet 框架(VT-3DCapsNet)。首先,我们提出了用于情感识别的 3D-CapsNet 模型,在该模型中,我们引入了改进的 3D-ResNet 架构,该架构与 AU 感知注意力模块集成,通过表达更深层次的分层时空特征并提取关键面部区域中的潜在信息(位置、大小、方向),增强了胶囊网络的特征表示能力。此外,我们提出了基于时间金字塔网络(TPN)的表情识别模块(TPN-ERM),该模块可以从视频帧中学习高级面部运动特征,以建模视觉节奏的差异,从而进一步提高 3D-CapsNet 的识别精度。我们在扩展的 Kohn-Kanada(CK+)数据库和 Acted Facial Expression in Wild(AFEW)数据库上进行了广泛的实验。结果表明,与其他最先进的方法相比,我们的方法具有竞争力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6df0/11343406/2b7b4d89c03c/pone.0307446.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验