眼动引导视觉Transformer 用于纠正捷径学习。

Eye-Gaze-Guided Vision Transformer for Rectifying Shortcut Learning.

出版信息

IEEE Trans Med Imaging. 2023 Nov;42(11):3384-3394. doi: 10.1109/TMI.2023.3287572. Epub 2023 Oct 27.

DOI:10.1109/TMI.2023.3287572

Abstract

Learning harmful shortcuts such as spurious correlations and biases prevents deep neural networks from learning meaningful and useful representations, thus jeopardizing the generalizability and interpretability of the learned representation. The situation becomes even more serious in medical image analysis, where the clinical data are limited and scarce while the reliability, generalizability and transparency of the learned model are highly required. To rectify the harmful shortcuts in medical imaging applications, in this paper, we propose a novel eye-gaze-guided vision transformer (EG-ViT) model which infuses the visual attention from radiologists to proactively guide the vision transformer (ViT) model to focus on regions with potential pathology rather than spurious correlations. To do so, the EG-ViT model takes the masked image patches that are within the radiologists' interest as input while has an additional residual connection to the last encoder layer to maintain the interactions of all patches. The experiments on two medical imaging datasets demonstrate that the proposed EG-ViT model can effectively rectify the harmful shortcut learning and improve the interpretability of the model. Meanwhile, infusing the experts' domain knowledge can also improve the large-scale ViT model's performance over all compared baseline methods with limited samples available. In general, EG-ViT takes the advantages of powerful deep neural networks while rectifies the harmful shortcut learning with human expert's prior knowledge. This work also opens new avenues for advancing current artificial intelligence paradigms by infusing human intelligence.

摘要

学习有害的捷径，如虚假相关和偏差，会阻止深度神经网络学习有意义和有用的表示，从而危及学习表示的泛化能力和可解释性。在医学图像分析中，情况更加严重，因为临床数据有限且稀缺，而对学习模型的可靠性、泛化能力和透明度的要求却很高。为了纠正医学成像应用中的有害捷径，本文提出了一种新颖的眼动引导视觉转换器（EG-ViT）模型，该模型从放射科医生那里注入视觉注意力，主动引导视觉转换器（ViT）模型关注潜在病理区域，而不是虚假相关区域。为此，EG-ViT 模型将放射科医生感兴趣的掩蔽图像补丁作为输入，同时在最后一个编码器层上增加一个残差连接，以保持所有补丁的相互作用。在两个医学成像数据集上的实验表明，所提出的 EG-ViT 模型可以有效地纠正有害的学习捷径，并提高模型的可解释性。同时，注入专家的领域知识也可以提高大规模 ViT 模型在有限样本情况下的性能，使其优于所有基线方法。总的来说，EG-ViT 利用了强大的深度神经网络的优势，同时利用人类专家的先验知识纠正了有害的学习捷径。这项工作也为通过注入人类智能来推进当前人工智能范式开辟了新的途径。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

眼动引导视觉Transformer 用于纠正捷径学习。

Eye-Gaze-Guided Vision Transformer for Rectifying Shortcut Learning.

出版信息

相似文献

引用本文的文献

眼动引导视觉Transformer 用于纠正捷径学习。

Eye-Gaze-Guided Vision Transformer for Rectifying Shortcut Learning.

出版信息

相似文献

引用本文的文献