多模态基准测试：用于多模态表示学习的多尺度基准测试

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.

作者信息

Liang Paul Pu, Lyu Yiwei, Fan Xiang, Wu Zetian, Cheng Yun, Wu Jason, Chen Leslie, Wu Peter, Lee Michelle A, Zhu Yuke, Salakhutdinov Ruslan, Morency Louis-Philippe

机构信息

CMU.

Johns Hopkins.

出版信息

Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1-20.

PMID:38774625

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11106632/

Abstract

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning spanning innovations in fusion paradigms, optimization objectives, and training approaches. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal machine learning research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized implementations, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.

摘要

学习多模态表示涉及整合来自多个异构数据源的信息。这是一个具有挑战性但至关重要的领域，在多媒体、情感计算、机器人技术、金融、人机交互和医疗保健等众多实际应用中都有应用。不幸的是，多模态研究在研究以下方面的资源有限：（1）跨领域和模态的泛化；（2）训练和推理过程中的复杂性；（3）对噪声和缺失模态的鲁棒性。为了在确保实际鲁棒性的同时加速对研究不足的模态和任务的进展，我们发布了MultiBench，这是一个系统且统一的多模态学习大规模基准，涵盖15个数据集、10种模态、20个预测任务和6个研究领域。MultiBench提供了一个自动化的端到端机器学习管道，简化并标准化了数据加载、实验设置和模型评估。为了实现全面评估，MultiBench提供了一种综合方法来评估：（1）泛化能力；（2）时间和空间复杂性；（3）模态鲁棒性。MultiBench为未来研究引入了有影响力的挑战，包括对大规模多模态数据集的可扩展性以及对现实缺陷的鲁棒性。为了配合这个基准，我们还提供了20种多模态学习核心方法的标准化实现，涵盖融合范式、优化目标和训练方法等方面的创新。简单应用不同研究领域提出的方法就能在9/15的数据集上提高当前最优性能。因此，MultiBench是统一多模态机器学习研究中分散努力的一个里程碑，为更好地理解多模态模型的能力和局限性铺平了道路，同时确保了易用性、可访问性和可重复性。我们的标准化实现、排行榜以及MultiBench都是公开可用的，将定期更新，并欢迎社区提供意见。

相似文献

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.

Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1-20.

Regulating Modality Utilization within Multimodal Fusion Networks.

Sensors (Basel). 2024 Sep 19;24(18):6054. doi: 10.3390/s24186054.

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets.

Vis Comput. 2022;38(8):2939-2970. doi: 10.1007/s00371-021-02166-7. Epub 2021 Jun 10.

Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities.

J Biomed Inform. 2023 Sep;145:104466. doi: 10.1016/j.jbi.2023.104466. Epub 2023 Aug 5.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Performance of a Computational Model of the Mammalian Olfactory System

CAMR: cross-aligned multimodal representation learning for cancer survival prediction.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad025.

Signals in the Cells: Multimodal and Contextualized Machine Learning Foundations for Therapeutics.

bioRxiv. 2024 Nov 12:2024.06.12.598655. doi: 10.1101/2024.06.12.598655.

Robust Multimodal Learning With Missing Modalities via Parameter-Efficient Adaptation.

IEEE Trans Pattern Anal Mach Intell. 2025 Feb;47(2):742-754. doi: 10.1109/TPAMI.2024.3476487. Epub 2025 Jan 9.

Multimodal Weibull Variational Autoencoder for Jointly Modeling Image-Text Data.

IEEE Trans Cybern. 2022 Oct;52(10):11156-11171. doi: 10.1109/TCYB.2021.3070881. Epub 2022 Sep 19.

引用本文的文献

Brain-Inspired Multisensory Learning: A Systematic Review of Neuroplasticity and Cognitive Outcomes in Adult Multicultural and Second Language Acquisition.

Biomimetics (Basel). 2025 Jun 12;10(6):397. doi: 10.3390/biomimetics10060397.

A scoping review of robustness concepts for machine learning in healthcare.

NPJ Digit Med. 2025 Jan 17;8(1):38. doi: 10.1038/s41746-024-01420-1.

Advanced neural network-based model for predicting court decisions on child custody.

PeerJ Comput Sci. 2024 Oct 22;10:e2293. doi: 10.7717/peerj-cs.2293. eCollection 2024.

Multimodal Federated Learning: A Survey.

Sensors (Basel). 2023 Aug 6;23(15):6986. doi: 10.3390/s23156986.

Effective Techniques for Multimodal Data Fusion: A Comparative Analysis.

Sensors (Basel). 2023 Feb 21;23(5):2381. doi: 10.3390/s23052381.

本文引用的文献

Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis.

Proc Conf Empir Methods Nat Lang Process. 2020 Nov;2020:1823-1833. doi: 10.18653/v1/2020.emnlp-main.143.

CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French.

Proc Conf Empir Methods Nat Lang Process. 2020 Nov;2020:1801-1812. doi: 10.18653/v1/2020.emnlp-main.141.

Second opinion needed: communicating uncertainty in medical machine learning.

NPJ Digit Med. 2021 Jan 5;4(1):4. doi: 10.1038/s41746-020-00367-3.

Social robots for education: A review.

Sci Robot. 2018 Aug 15;3(21). doi: 10.1126/scirobotics.aat5954.

Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation.

Med Image Anal. 2020 Oct;65:101766. doi: 10.1016/j.media.2020.101766. Epub 2020 Jun 27.

Multimodal Transformer for Unaligned Multimodal Language Sequences.

Proc Conf Assoc Comput Linguist Meet. 2019 Jul;2019:6558-6569. doi: 10.18653/v1/p19-1656.

Overview of artificial intelligence in medicine.

J Family Med Prim Care. 2019 Jul;8(7):2328-2331. doi: 10.4103/jfmpc.jfmpc_440_19.

Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.

J Biomed Inform. 2018 Dec;88:11-19. doi: 10.1016/j.jbi.2018.10.005. Epub 2018 Oct 24.

Multimodal Machine Learning: A Survey and Taxonomy.

IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):423-443. doi: 10.1109/TPAMI.2018.2798607. Epub 2018 Jan 25.

Benchmarking deep learning models on large healthcare datasets.

J Biomed Inform. 2018 Jul;83:112-134. doi: 10.1016/j.jbi.2018.04.007. Epub 2018 Jun 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多模态基准测试：用于多模态表示学习的多尺度基准测试

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献