使用TAVAC量化视觉Transformer模型中的解释可重复性。

Quantifying interpretation reproducibility in Vision Transformer models with TAVAC.

作者信息

Zhao Yue, Agyemang Dylan, Liu Yang, Mahoney Matt, Li Sheng

机构信息

The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.

Department of Mathematics and Statistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

出版信息

Sci Adv. 2024 Dec 20;10(51):eabg0264. doi: 10.1126/sciadv.abg0264.

DOI:10.1126/sciadv.abg0264

PMID:39705362

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11661421/

Abstract

Deep learning algorithms can extract meaningful diagnostic features from biomedical images, promising improved patient care in digital pathology. Vision Transformer (ViT) models capture long-range spatial relationships and offer robust prediction power and better interpretability for image classification tasks than convolutional neural network models. However, limited annotated biomedical imaging datasets can cause ViT models to overfit, leading to false predictions due to random noise. To address this, we introduce Training Attention and Validation Attention Consistency (TAVAC), a metric for evaluating ViT model overfitting and quantifying interpretation reproducibility. By comparing high-attention regions between training and testing, we tested TAVAC on four public image classification datasets and two independent breast cancer histological image datasets. Overfitted models showed significantly lower TAVAC scores. TAVAC also distinguishes off-target from on-target attentions and measures interpretation generalization at a fine-grained cellular level. Beyond diagnostics, TAVAC enhances interpretative reproducibility in basic research, revealing critical spatial patterns and cellular structures of biomedical and other general nonbiomedical images.

摘要

深度学习算法可以从生物医学图像中提取有意义的诊断特征，有望改善数字病理学中的患者护理。视觉Transformer（ViT）模型能够捕捉远距离空间关系，并且与卷积神经网络模型相比，在图像分类任务中具有更强的预测能力和更好的可解释性。然而，有限的带注释生物医学成像数据集可能会导致ViT模型过拟合，从而由于随机噪声产生错误预测。为了解决这个问题，我们引入了训练注意力和验证注意力一致性（TAVAC），这是一种用于评估ViT模型过拟合和量化解释可重复性的指标。通过比较训练和测试之间的高注意力区域，我们在四个公共图像分类数据集和两个独立的乳腺癌组织学图像数据集上测试了TAVAC。过拟合模型的TAVAC分数显著更低。TAVAC还能区分偏离目标的注意力和目标注意力，并在细粒度细胞水平上衡量解释的泛化能力。除了诊断之外，TAVAC还提高了基础研究中的解释可重复性，揭示了生物医学和其他一般非生物医学图像的关键空间模式和细胞结构。