Wu You, Mi Qingwei, Gao Tianhan
Software College, Northeastern University, Shenyang 110169, China.
Biomimetics (Basel). 2025 Jun 27;10(7):418. doi: 10.3390/biomimetics10070418.
This paper presents a comprehensive review of multimodal emotion recognition (MER), a process that integrates multiple data modalities such as speech, visual, and text to identify human emotions. Grounded in biomimetics, the survey frames MER as a bio-inspired sensing paradigm that emulates the way humans seamlessly fuse multisensory cues to communicate affect, thereby transferring principles from living systems to engineered solutions. By leveraging various modalities, MER systems offer a richer and more robust analysis of emotional states compared to unimodal approaches. The review covers the general structure of MER systems, feature extraction techniques, and multimodal information fusion strategies, highlighting key advancements and milestones. Additionally, it addresses the research challenges and open issues in MER, including lightweight models, cross-corpus generalizability, and the incorporation of additional modalities. The paper concludes by discussing future directions aimed at improving the accuracy, explainability, and practicality of MER systems for real-world applications.
本文对多模态情感识别(MER)进行了全面综述,这是一个整合语音、视觉和文本等多种数据模态以识别人类情感的过程。基于仿生学,该综述将MER构建为一种受生物启发的传感范式,它模仿人类无缝融合多感官线索以传达情感的方式,从而将生物系统中的原理应用于工程解决方案。通过利用各种模态,与单模态方法相比,MER系统能够对情感状态进行更丰富、更稳健的分析。该综述涵盖了MER系统的总体结构、特征提取技术和多模态信息融合策略,突出了关键进展和里程碑。此外,它还探讨了MER中的研究挑战和开放问题,包括轻量级模型、跨语料库通用性以及纳入其他模态。本文最后讨论了旨在提高MER系统在实际应用中的准确性、可解释性和实用性的未来方向。