Suppr超能文献

学得更多未必更好:视觉与语言任务中的知识可迁移性

Learning More May Not Be Better: Knowledge Transferability in Vision-and-Language Tasks.

作者信息

Chen Tianwei, Garcia Noa, Otani Mayu, Chu Chenhui, Nakashima Yuta, Nagahara Hajime

机构信息

Institute for Datability Science, Osaka University, Osaka 565-0871, Japan.

CyberAgent Inc., Tokyo 150-0042, Japan.

出版信息

J Imaging. 2024 Nov 22;10(12):300. doi: 10.3390/jimaging10120300.

Abstract

Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred.

摘要

对视觉语言模型而言,学习更多知识就总是更好吗?在本文中,我们研究多模态任务中的知识可迁移性。机器学习当前的趋势是假设通过合并来自不同任务的多个数据集,其整体性能会提高。然而,我们表明并非所有知识都能很好地迁移或对相关任务产生积极影响,即使它们有共同目标。我们基于对十二种视觉语言任务(分为四组)进行的数百次交叉实验展开了详尽分析。虽然同一组中的任务易于相互促进,但结果表明情况并非总是如此。此外,其他因素,如数据集大小或预训练阶段,可能对知识迁移的效果有很大影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b72/11676753/fac0a665576c/jimaging-10-00300-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验