学得更多未必更好：视觉与语言任务中的知识可迁移性

Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks.

作者信息

Chen Tianwei, Garcia Noa, Otani Mayu, Chu Chenhui, Nakashima Yuta, Nagahara Hajime

机构信息

Institute for Datability Science, Osaka University, Osaka 565-0871, Japan.

CyberAgent Inc., Tokyo 150-0042, Japan.

出版信息

J Imaging. 2024 Nov 22;10(12):300. doi: 10.3390/jimaging10120300.

DOI:10.3390/jimaging10120300

PMID:39728197

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11676753/

Abstract

Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred.

摘要

对视觉语言模型而言，学习更多知识就总是更好吗？在本文中，我们研究多模态任务中的知识可迁移性。机器学习当前的趋势是假设通过合并来自不同任务的多个数据集，其整体性能会提高。然而，我们表明并非所有知识都能很好地迁移或对相关任务产生积极影响，即使它们有共同目标。我们基于对十二种视觉语言任务（分为四组）进行的数百次交叉实验展开了详尽分析。虽然同一组中的任务易于相互促进，但结果表明情况并非总是如此。此外，其他因素，如数据集大小或预训练阶段，可能对知识迁移的效果有很大影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b72/11676753/fac0a665576c/jimaging-10-00300-g001.jpg

相似文献

Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks.学得更多未必更好：视觉与语言任务中的知识可迁移性

J Imaging. 2024 Nov 22;10(12):300. doi: 10.3390/jimaging10120300.

What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations.语言与视觉Transformer看到了什么：语义信息对视觉表征的影响。

Front Artif Intell. 2021 Dec 3;4:767971. doi: 10.3389/frai.2021.767971. eCollection 2021.

Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training.通过视觉语言预训练实现医学图像与文本的多模态理解与生成

IEEE J Biomed Health Inform. 2022 Dec;26(12):6070-6080. doi: 10.1109/JBHI.2022.3207502. Epub 2022 Dec 7.

Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering.用于医学视觉问答的多目标跨模态自监督视觉语言预训练

J Biomed Inform. 2024 Dec;160:104748. doi: 10.1016/j.jbi.2024.104748. Epub 2024 Nov 12.

Large-scale benchmarking and boosting transfer learning for medical image analysis.用于医学图像分析的大规模基准测试与增强迁移学习

Med Image Anal. 2025 May;102:103487. doi: 10.1016/j.media.2025.103487. Epub 2025 Feb 21.

MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.MCPL：用于医学视觉语言模型的多模态协作提示学习

IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.

X -VLM: All-in-One Pre-Trained Model for Vision-Language Tasks.X-VLM：用于视觉语言任务的一体化预训练模型。

IEEE Trans Pattern Anal Mach Intell. 2024 May;46(5):3156-3168. doi: 10.1109/TPAMI.2023.3339661. Epub 2024 Apr 3.

Boosting adversarial transferability in vision-language models via multimodal feature heterogeneity.通过多模态特征异质性提升视觉语言模型中的对抗迁移能力。

Sci Rep. 2025 Mar 2;15(1):7366. doi: 10.1038/s41598-025-91802-6.

A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。

Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.

Does non-COVID-19 lung lesion help? investigating transferability in COVID-19 CT image segmentation.非 COVID-19 肺部病变有帮助吗？在 COVID-19 CT 图像分割中探究可转移性。

Comput Methods Programs Biomed. 2021 Apr;202:106004. doi: 10.1016/j.cmpb.2021.106004. Epub 2021 Feb 23.

本文引用的文献

HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation.HOP+：用于视觉语言导航的具有历史增强和顺序感知的预训练。

IEEE Trans Pattern Anal Mach Intell. 2023 Jul;45(7):8524-8537. doi: 10.1109/TPAMI.2023.3234243. Epub 2023 Jun 5.

Factors of Influence for Transfer Learning Across Diverse Appearance Domains and Task Types.跨不同外观领域和任务类型的迁移学习影响因素。

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9298-9314. doi: 10.1109/TPAMI.2021.3129870. Epub 2022 Nov 7.

How to reduce the number of rating scale items without predictability loss?如何在不损失可预测性的情况下减少评定量表项目的数量？

Scientometrics. 2017;111(2):581-593. doi: 10.1007/s11192-017-2283-4. Epub 2017 Feb 16.

Overcoming catastrophic forgetting in neural networks.克服神经网络中的灾难性遗忘。

Proc Natl Acad Sci U S A. 2017 Mar 28;114(13):3521-3526. doi: 10.1073/pnas.1611835114. Epub 2017 Mar 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

学得更多未必更好：视觉与语言任务中的知识可迁移性

Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献