Liang Paul Pu, Lyu Yiwei, Fan Xiang, Wu Zetian, Cheng Yun, Wu Jason, Chen Leslie, Wu Peter, Lee Michelle A, Zhu Yuke, Salakhutdinov Ruslan, Morency Louis-Philippe
CMU.
Johns Hopkins.
Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1-20.
Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning spanning innovations in fusion paradigms, optimization objectives, and training approaches. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal machine learning research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized implementations, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.
学习多模态表示涉及整合来自多个异构数据源的信息。这是一个具有挑战性但至关重要的领域,在多媒体、情感计算、机器人技术、金融、人机交互和医疗保健等众多实际应用中都有应用。不幸的是,多模态研究在研究以下方面的资源有限:(1)跨领域和模态的泛化;(2)训练和推理过程中的复杂性;(3)对噪声和缺失模态的鲁棒性。为了在确保实际鲁棒性的同时加速对研究不足的模态和任务的进展,我们发布了MultiBench,这是一个系统且统一的多模态学习大规模基准,涵盖15个数据集、10种模态、20个预测任务和6个研究领域。MultiBench提供了一个自动化的端到端机器学习管道,简化并标准化了数据加载、实验设置和模型评估。为了实现全面评估,MultiBench提供了一种综合方法来评估:(1)泛化能力;(2)时间和空间复杂性;(3)模态鲁棒性。MultiBench为未来研究引入了有影响力的挑战,包括对大规模多模态数据集的可扩展性以及对现实缺陷的鲁棒性。为了配合这个基准,我们还提供了20种多模态学习核心方法的标准化实现,涵盖融合范式、优化目标和训练方法等方面的创新。简单应用不同研究领域提出的方法就能在9/15的数据集上提高当前最优性能。因此,MultiBench是统一多模态机器学习研究中分散努力的一个里程碑,为更好地理解多模态模型的能力和局限性铺平了道路,同时确保了易用性、可访问性和可重复性。我们的标准化实现、排行榜以及MultiBench都是公开可用的,将定期更新,并欢迎社区提供意见。