• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于半张量积的并行多模态因子分解双线性池化融合情感识别方法。

A Parallel Multi-Modal Factorized Bilinear Pooling Fusion Method Based on the Semi-Tensor Product for Emotion Recognition.

作者信息

Liu Fen, Chen Jianfeng, Li Kemeng, Tan Weijie, Cai Chang, Ayub Muhammad Saad

机构信息

School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072, China.

College of Mathematics and Computer Science, Yan'an University, Yan'an 716000, China.

出版信息

Entropy (Basel). 2022 Dec 16;24(12):1836. doi: 10.3390/e24121836.

DOI:10.3390/e24121836
PMID:36554241
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9777841/
Abstract

Multi-modal fusion can exploit complementary information from various modalities and improve the accuracy of prediction or classification tasks. In this paper, we propose a parallel, multi-modal, factorized, bilinear pooling method based on a semi-tensor product (STP) for information fusion in emotion recognition. Initially, we apply the STP to factorize a high-dimensional weight matrix into two low-rank factor matrices without dimension matching constraints. Next, we project the multi-modal features to the low-dimensional matrices and perform multiplication based on the STP to capture the rich interactions between the features. Finally, we utilize an STP-pooling method to reduce the dimensionality to get the final features. This method can achieve the information fusion between modalities of different scales and dimensions and avoids data redundancy due to dimension matching. Experimental verification of the proposed method on the emotion-recognition task using the IEMOCAP and CMU-MOSI datasets showed a significant reduction in storage space and recognition time. The results also validate that the proposed method improves the performance and reduces both the training time and the number of parameters.

摘要

多模态融合可以利用来自各种模态的互补信息,并提高预测或分类任务的准确性。在本文中,我们提出了一种基于半张量积(STP)的并行、多模态、因式分解双线性池化方法,用于情感识别中的信息融合。首先,我们应用STP将高维权重矩阵分解为两个低秩因子矩阵,而无需维度匹配约束。接下来,我们将多模态特征投影到低维矩阵上,并基于STP进行乘法运算,以捕捉特征之间丰富的交互。最后,我们利用STP池化方法降低维度以获得最终特征。该方法可以实现不同尺度和维度的模态之间的信息融合,并避免由于维度匹配导致的数据冗余。使用IEMOCAP和CMU-MOSI数据集对所提出的方法在情感识别任务上进行的实验验证表明,存储空间和识别时间显著减少。结果还验证了所提出的方法提高了性能,并减少了训练时间和参数数量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bb8/9777841/a0af88663148/entropy-24-01836-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bb8/9777841/76b0d6eea42a/entropy-24-01836-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bb8/9777841/4d1d940cfa0b/entropy-24-01836-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bb8/9777841/a0af88663148/entropy-24-01836-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bb8/9777841/76b0d6eea42a/entropy-24-01836-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bb8/9777841/4d1d940cfa0b/entropy-24-01836-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bb8/9777841/a0af88663148/entropy-24-01836-g003.jpg

相似文献

1
A Parallel Multi-Modal Factorized Bilinear Pooling Fusion Method Based on the Semi-Tensor Product for Emotion Recognition.一种基于半张量积的并行多模态因子分解双线性池化融合情感识别方法。
Entropy (Basel). 2022 Dec 16;24(12):1836. doi: 10.3390/e24121836.
2
Research on cross-modal emotion recognition based on multi-layer semantic fusion.基于多层语义融合的跨模态情感识别研究
Math Biosci Eng. 2024 Jan 17;21(2):2488-2514. doi: 10.3934/mbe.2024110.
3
Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion.基于级联多通道和分层融合的多模态情绪识别。
Comput Intell Neurosci. 2023 Jan 5;2023:9645611. doi: 10.1155/2023/9645611. eCollection 2023.
4
A Multi-Modal Fusion Method Based on Higher-Order Orthogonal Iteration Decomposition.一种基于高阶正交迭代分解的多模态融合方法。
Entropy (Basel). 2021 Oct 15;23(10):1349. doi: 10.3390/e23101349.
5
Multi-Modal Fusion Emotion Recognition Method of Speech Expression Based on Deep Learning.基于深度学习的语音表达多模态融合情感识别方法
Front Neurorobot. 2021 Jul 9;15:697634. doi: 10.3389/fnbot.2021.697634. eCollection 2021.
6
Joint low-rank tensor fusion and cross-modal attention for multimodal physiological signals based emotion recognition.基于多模态生理信号的联合低秩张量融合和跨模态注意的情感识别。
Physiol Meas. 2024 Jul 11;45(7). doi: 10.1088/1361-6579/ad5bbc.
7
GCF-Net: global-aware cross-modal feature fusion network for speech emotion recognition.GCF-Net:用于语音情感识别的全局感知跨模态特征融合网络
Front Neurosci. 2023 May 4;17:1183132. doi: 10.3389/fnins.2023.1183132. eCollection 2023.
8
Transformer-Based Multi-Modal Data Fusion Method for COPD Classification and Physiological and Biochemical Indicators Identification.基于Transformer的慢性阻塞性肺疾病分类及生理生化指标识别的多模态数据融合方法
Biomolecules. 2023 Sep 15;13(9):1391. doi: 10.3390/biom13091391.
9
Multimodal Feature Fusion Method for Unbalanced Sample Data in Social Network Public Opinion.社交网络舆情中不平衡样本数据的多模态特征融合方法
Sensors (Basel). 2022 Jul 25;22(15):5528. doi: 10.3390/s22155528.
10
A multi-stage dynamical fusion network for multimodal emotion recognition.一种用于多模态情感识别的多阶段动态融合网络。
Cogn Neurodyn. 2023 Jun;17(3):671-680. doi: 10.1007/s11571-022-09851-w. Epub 2022 Jul 31.

引用本文的文献

1
Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation.通过集成多模态分析增强人类活动识别:重点关注 RGB 成像、骨骼跟踪和姿势估计。
Sensors (Basel). 2024 Jul 17;24(14):4646. doi: 10.3390/s24144646.
2
A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face.基于深度学习的多模态情感识别综述:语音、文本和面部
Entropy (Basel). 2023 Oct 12;25(10):1440. doi: 10.3390/e25101440.

本文引用的文献

1
A Multi-Modal Fusion Method Based on Higher-Order Orthogonal Iteration Decomposition.一种基于高阶正交迭代分解的多模态融合方法。
Entropy (Basel). 2021 Oct 15;23(10):1349. doi: 10.3390/e23101349.
2
Multimodal Transformer for Unaligned Multimodal Language Sequences.用于未对齐多模态语言序列的多模态变换器
Proc Conf Assoc Comput Linguist Meet. 2019 Jul;2019:6558-6569. doi: 10.18653/v1/p19-1656.
3
Multi-attention Recurrent Network for Human Communication Comprehension.用于人类交流理解的多注意力循环网络。
Proc AAAI Conf Artif Intell. 2018 Feb;2018:5642-5649.
4
Multimodal Machine Learning: A Survey and Taxonomy.多模态机器学习:一项综述与分类法
IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):423-443. doi: 10.1109/TPAMI.2018.2798607. Epub 2018 Jan 25.
5
Video2vec Embeddings Recognize Events When Examples Are Scarce.Video2vec 嵌入识别在例子稀缺时的事件。
IEEE Trans Pattern Anal Mach Intell. 2017 Oct;39(10):2089-2103. doi: 10.1109/TPAMI.2016.2627563. Epub 2016 Nov 10.
6
Some mathematical notes on three-mode factor analysis.关于三模式因子分析的一些数学注释。
Psychometrika. 1966 Sep;31(3):279-311. doi: 10.1007/BF02289464.