基于深度度量学习的统一多模态分类框架。

A unified multimodal classification framework based on deep metric learning.

作者信息

Peng Liwen, Jian Songlei, Li Minne, Kan Zhigang, Qiao Linbo, Li Dongsheng

机构信息

Intelligent Game and Decision Lab, Beijing, 100080, China; College of Computer, National University of Defense Technology, Changsha Hunan 410073, China.

College of Computer, National University of Defense Technology, Changsha Hunan 410073, China.

出版信息

Neural Netw. 2025 Jan;181:106747. doi: 10.1016/j.neunet.2024.106747. Epub 2024 Oct 4.

DOI:10.1016/j.neunet.2024.106747

PMID:39369458

Abstract

Multimodal classification algorithms play an essential role in multimodal machine learning, aiming to categorize distinct data points by analyzing data characteristics from multiple modalities. Extensive research has been conducted on distilling multimodal attributes and devising specialized fusion strategies for targeted classification tasks. Nevertheless, current algorithms mainly concentrate on a specific classification task and process data about the corresponding modalities. To address these limitations, we propose a unified multimodal classification framework proficient in handling diverse multimodal classification tasks and processing data from disparate modalities. UMCF is task-independent, and its unimodal feature extraction module can be adaptively substituted to accommodate data from diverse modalities. Moreover, we construct the multimodal learning scheme based on deep metric learning to mine latent characteristics within multimodal data. Specifically, we design the metric-based triplet learning to extract the intra-modal relationships within each modality and the contrastive pairwise learning to capture the inter-modal relationships across various modalities. Extensive experiments on two multimodal classification tasks, fake news detection and sentiment analysis, demonstrate that UMCF can extract multimodal data features and achieve superior classification performance than task-specific benchmarks. UMCF outperforms the best fake news detection baselines by 2.3% on average regarding F1 scores.

摘要

多模态分类算法在多模态机器学习中起着至关重要的作用，旨在通过分析来自多个模态的数据特征对不同的数据点进行分类。在提取多模态属性和为目标分类任务设计专门的融合策略方面已经进行了广泛的研究。然而，当前的算法主要集中在特定的分类任务上，并处理相应模态的数据。为了解决这些限制，我们提出了一个统一的多模态分类框架，该框架擅长处理各种多模态分类任务并处理来自不同模态的数据。UMCF与任务无关，其单模态特征提取模块可以自适应替换以适应来自不同模态的数据。此外，我们基于深度度量学习构建多模态学习方案，以挖掘多模态数据中的潜在特征。具体来说，我们设计基于度量的三元组学习来提取每个模态内的模态内关系，并设计对比成对学习来捕捉跨各种模态的模态间关系。在假新闻检测和情感分析这两个多模态分类任务上的大量实验表明，UMCF可以提取多模态数据特征，并比特定任务的基准实现更好的分类性能。在F1分数方面，UMCF平均比最佳假新闻检测基线高出2.3%。

相似文献

A unified multimodal classification framework based on deep metric learning.基于深度度量学习的统一多模态分类框架。

Neural Netw. 2025 Jan;181:106747. doi: 10.1016/j.neunet.2024.106747. Epub 2024 Oct 4.

Modality Perception Learning-Based Determinative Factor Discovery for Multimodal Fake News Detection.基于模态感知学习的多模态假新闻检测决定性因素发现

IEEE Trans Neural Netw Learn Syst. 2024 Sep 20;PP. doi: 10.1109/TNNLS.2024.3446030.

Text-image multimodal fusion model for enhanced fake news detection.用于增强假新闻检测的文本-图像多模态融合模型。

Sci Prog. 2024 Oct-Dec;107(4):368504241292685. doi: 10.1177/00368504241292685.

CLAAF: Multimodal fake information detection based on contrastive learning and adaptive Agg-modality fusion.CLAAF：基于对比学习和自适应聚合模态融合的多模态虚假信息检测

PLoS One. 2025 May 7;20(5):e0322556. doi: 10.1371/journal.pone.0322556. eCollection 2025.

Imaging-genomic spatial-modality attentive fusion for studying neuropsychiatric disorders.影像-基因组空间模态注意力融合用于研究神经精神障碍

Hum Brain Mapp. 2024 Dec 1;45(17):e26799. doi: 10.1002/hbm.26799.

Detection of Fake News Text Classification on COVID-19 Using Deep Learning Approaches.基于深度学习方法的 COVID-19 假新闻文本分类检测。

Comput Math Methods Med. 2021 Nov 15;2021:5514220. doi: 10.1155/2021/5514220. eCollection 2021.

Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities.基于注意力的多模态融合与对比，用于在模态缺失的情况下进行稳健的临床预测。

J Biomed Inform. 2023 Sep;145:104466. doi: 10.1016/j.jbi.2023.104466. Epub 2023 Aug 5.

Weighted Multi-Modal Contrastive Learning Based Hybrid Network for Alzheimer's Disease Diagnosis.基于加权多模态对比学习的混合网络用于阿尔茨海默病诊断

IEEE Trans Neural Syst Rehabil Eng. 2025;33:1135-1144. doi: 10.1109/TNSRE.2025.3549730. Epub 2025 Mar 19.

TLFND: A Multimodal Fusion Model Based on Three-Level Feature Matching Distance for Fake News Detection.TLFND：一种基于三级特征匹配距离的多模态融合假新闻检测模型。

Entropy (Basel). 2023 Nov 10;25(11):1533. doi: 10.3390/e25111533.

Uncertainty-Aware Graph Contrastive Fusion Network for multimodal physiological signal emotion recognition.用于多模态生理信号情感识别的不确定性感知图对比融合网络

Neural Netw. 2025 Jul;187:107363. doi: 10.1016/j.neunet.2025.107363. Epub 2025 Mar 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于深度度量学习的统一多模态分类框架。

A unified multimodal classification framework based on deep metric learning.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献