Chen Tianwei, Garcia Noa, Otani Mayu, Chu Chenhui, Nakashima Yuta, Nagahara Hajime
Institute for Datability Science, Osaka University, Osaka 565-0871, Japan.
CyberAgent Inc., Tokyo 150-0042, Japan.
J Imaging. 2024 Nov 22;10(12):300. doi: 10.3390/jimaging10120300.
Is learning more knowledge always better for vision-and-language models? In this paper, we study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks, their overall performance improves. However, we show that not all knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conducted an exhaustive analysis based on hundreds of cross-experiments on twelve vision-and-language tasks categorized into four groups. While tasks in the same group are prone to improve each other, results show that this is not always the case. In addition, other factors, such as dataset size or the pre-training stage, may have a great impact on how well the knowledge is transferred.
对视觉语言模型而言,学习更多知识就总是更好吗?在本文中,我们研究多模态任务中的知识可迁移性。机器学习当前的趋势是假设通过合并来自不同任务的多个数据集,其整体性能会提高。然而,我们表明并非所有知识都能很好地迁移或对相关任务产生积极影响,即使它们有共同目标。我们基于对十二种视觉语言任务(分为四组)进行的数百次交叉实验展开了详尽分析。虽然同一组中的任务易于相互促进,但结果表明情况并非总是如此。此外,其他因素,如数据集大小或预训练阶段,可能对知识迁移的效果有很大影响。