• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Multi-attention Recurrent Network for Human Communication Comprehension.用于人类交流理解的多注意力循环网络。
Proc AAAI Conf Artif Intell. 2018 Feb;2018:5642-5649.
2
Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors.词语可变换:利用非语言行为动态调整词语表征
Proc AAAI Conf Artif Intell. 2019 Jul;33(1):7216-7223.
3
Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks.基于跨模态注意力和门控循环层次融合网络的多模态情感分析。
Comput Intell Neurosci. 2022 Aug 9;2022:4767437. doi: 10.1155/2022/4767437. eCollection 2022.
4
Integrating Multimodal Information in Large Pretrained Transformers.在大型预训练变压器中整合多模态信息。
Proc Conf Assoc Comput Linguist Meet. 2020 Jul;2020:2359-2369. doi: 10.18653/v1/2020.acl-main.214.
5
Learning interaction dynamics with an interactive LSTM for conversational sentiment analysis.通过交互式 LSTM 学习交互动态,用于会话情感分析。
Neural Netw. 2021 Jan;133:40-56. doi: 10.1016/j.neunet.2020.10.001. Epub 2020 Oct 21.
6
Social eye gaze modulates processing of speech and co-speech gesture.社交目光注视会调节言语和伴随言语的手势的处理过程。
Cognition. 2014 Dec;133(3):692-7. doi: 10.1016/j.cognition.2014.08.008. Epub 2014 Sep 29.
7
Intention processing in communication: a common brain network for language and gestures.交流中的意图处理:语言和手势的共同大脑网络。
J Cogn Neurosci. 2011 Sep;23(9):2415-31. doi: 10.1162/jocn.2010.21594. Epub 2010 Oct 18.
8
Metaphor processing is supramodal semantic processing: The role of the bilateral lateral temporal regions in multimodal communication.隐喻处理是超模态语义处理:双侧颞叶区域在多模态交流中的作用。
Brain Lang. 2020 Jun;205:104772. doi: 10.1016/j.bandl.2020.104772. Epub 2020 Feb 29.
9
Realization of Self-Adaptive Higher Teaching Management Based Upon Expression and Speech Multimodal Emotion Recognition.基于表情与语音多模态情感识别的自适应高等教学管理实现
Front Psychol. 2022 Mar 28;13:857924. doi: 10.3389/fpsyg.2022.857924. eCollection 2022.
10
A multimodal human-robot sign language interaction framework applied in social robots.一种应用于社交机器人的多模态人机手语交互框架。
Front Neurosci. 2023 Apr 11;17:1168888. doi: 10.3389/fnins.2023.1168888. eCollection 2023.

引用本文的文献

1
Adaptive Graph Learning with Multimodal Fusion for Emotion Recognition in Conversation.用于对话中情感识别的多模态融合自适应图学习
Biomimetics (Basel). 2025 Jun 25;10(7):414. doi: 10.3390/biomimetics10070414.
2
Analysis of the fusion of multimodal sentiment perception and physiological signals in Chinese-English cross-cultural communication: Transformer approach incorporating self-attention enhancement.汉英跨文化交际中多模态情感感知与生理信号融合分析:融入自注意力增强的Transformer方法
PeerJ Comput Sci. 2025 May 23;11:e2890. doi: 10.7717/peerj-cs.2890. eCollection 2025.
3
Building a Gender-Bias-Resistant Super Corpus as a Deep Learning Baseline for Speech Emotion Recognition.构建一个抗性别偏见的超级语料库作为语音情感识别的深度学习基线。
Sensors (Basel). 2025 Mar 22;25(7):1991. doi: 10.3390/s25071991.
4
HGF-MiLaG: Hierarchical Graph Fusion for Emotion Recognition in Conversation with Mid-Late Gender-Aware Strategy.HGF-MiLaG:用于对话中情感识别的具有中后期性别感知策略的分层图融合
Sensors (Basel). 2025 Feb 14;25(4):1182. doi: 10.3390/s25041182.
5
A Comprehensive Analysis of a Social Intelligence Dataset and Response Tendencies Between Large Language Models (LLMs) and Humans.大型语言模型(LLMs)与人类之间社会智能数据集及反应倾向的综合分析
Sensors (Basel). 2025 Jan 15;25(2):477. doi: 10.3390/s25020477.
6
A multigrained preference analysis method for product iterative design incorporating AI-generated review detection.一种结合人工智能生成评论检测的产品迭代设计多粒度偏好分析方法。
Sci Rep. 2025 Jan 20;15(1):2528. doi: 10.1038/s41598-025-86551-5.
7
Advances and prospects of multi-modal ophthalmic artificial intelligence based on deep learning: a review.基于深度学习的多模态眼科人工智能研究进展与展望:综述
Eye Vis (Lond). 2024 Oct 1;11(1):38. doi: 10.1186/s40662-024-00405-1.
8
Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space.基于音频的情感识别:在工程特征空间上使用自监督学习
AI (Basel). 2024 Mar;5(1):195-207. doi: 10.3390/ai5010011. Epub 2024 Jan 17.
9
Merge-and-Split Graph Convolutional Network for Skeleton-Based Interaction Recognition.用于基于骨架的交互识别的合并与分割图卷积网络
Cyborg Bionic Syst. 2024 Mar 20;5:0102. doi: 10.34133/cbsystems.0102. eCollection 2024.
10
A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face.基于深度学习的多模态情感识别综述:语音、文本和面部
Entropy (Basel). 2023 Oct 12;25(10):1440. doi: 10.3390/e25101440.

本文引用的文献

1
Neural synchronization during face-to-face communication.面对面交流中的神经同步。
J Neurosci. 2012 Nov 7;32(45):16064-9. doi: 10.1523/JNEUROSCI.2926-12.2012.
2
Imaging first impressions: distinct neural processing of verbal and nonverbal social information.成像的第一印象:言语和非言语社会信息的独特神经处理。
Neuroimage. 2012 Mar;60(1):179-88. doi: 10.1016/j.neuroimage.2011.12.046. Epub 2011 Dec 27.
3
Hidden conditional random fields.隐条件随机字段
IEEE Trans Pattern Anal Mach Intell. 2007 Oct;29(10):1848-53. doi: 10.1109/TPAMI.2007.1124.
4
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.
5
Functional and anatomical decomposition of face processing: evidence from prosopagnosia and PET study of normal subjects.面部加工的功能与解剖学分解:来自面孔失认症及正常受试者PET研究的证据
Philos Trans R Soc Lond B Biol Sci. 1992 Jan 29;335(1273):55-61; discussion 61-2. doi: 10.1098/rstb.1992.0007.

用于人类交流理解的多注意力循环网络。

Multi-attention Recurrent Network for Human Communication Comprehension.

作者信息

Zadeh Amir, Liang Paul Pu, Poria Soujanya, Vij Prateek, Cambria Erik, Morency Louis-Philippe

机构信息

Carnegie Mellon University, USA.

NTU, Singapore.

出版信息

Proc AAAI Conf Artif Intell. 2018 Feb;2018:5642-5649.

PMID:32257595
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7136010/
Abstract

Human face-to-face communication is a complex multimodal signal. We use words (language modality), gestures (vision modality) and changes in tone (acoustic modality) to convey our intentions. Humans easily process and understand face-to-face communication, however, comprehending this form of communication remains a significant challenge for Artificial Intelligence (AI). AI must understand each modality and the interactions between them that shape the communication. In this paper, we present a novel neural architecture for understanding human communication called the Multi-attention Recurrent Network (MARN). The main strength of our model comes from discovering interactions between modalities through time using a neural component called the Multi-attention Block (MAB) and storing them in the hybrid memory of a recurrent component called the Long-short Term Hybrid Memory (LSTHM). We perform extensive comparisons on six publicly available datasets for multimodal sentiment analysis, speaker trait recognition and emotion recognition. MARN shows state-of-the-art results performance in all the datasets.

摘要

人类面对面交流是一种复杂的多模态信号。我们使用言语(语言模态)、手势(视觉模态)和语调变化(声学模态)来传达意图。人类能够轻松处理和理解面对面交流,然而,理解这种交流形式对人工智能(AI)来说仍然是一项重大挑战。人工智能必须理解每种模态以及它们之间形成交流的相互作用。在本文中,我们提出了一种用于理解人类交流的新型神经架构,称为多注意力循环网络(MARN)。我们模型的主要优势在于通过一个名为多注意力模块(MAB)的神经组件在时间维度上发现模态之间的相互作用,并将其存储在一个名为长短时混合记忆(LSTHM)的循环组件的混合记忆中。我们在六个公开可用的多模态情感分析、说话者特征识别和情感识别数据集上进行了广泛比较。MARN在所有数据集中都展现出了领先的结果表现。