Suppr超能文献

MoViT:用于医学图像分析的记忆视觉Transformer

MoViT: Memorizing Vision Transformers for Medical Image Analysis.

作者信息

Shen Yiqing, Guo Pengfei, Wu Jingpu, Huang Qianqi, Le Nhat, Zhou Jinyuan, Jiang Shanshan, Unberath Mathias

机构信息

Johns Hopkins University, Baltimore, USA.

出版信息

Mach Learn Med Imaging. 2024;14349:205-213. doi: 10.1007/978-3-031-45676-3_21. Epub 2023 Oct 15.

Abstract

The synergy of long-range dependencies from transformers and local representations of image content from convolutional neural networks (CNNs) has led to advanced architectures and increased performance for various medical image analysis tasks due to their complementary benefits. However, compared with CNNs, transformers require considerably more training data, due to a larger number of parameters and an absence of inductive bias. The need for increasingly large datasets continues to be problematic, particularly in the context of medical imaging, where both annotation efforts and data protection result in limited data availability. In this work, inspired by the human decision-making process of correlating new "evidence" with previously memorized "experience", we propose a Memorizing Vision Transformer (MoViT) to alleviate the need for large-scale datasets to successfully train and deploy transformer-based architectures. MoViT leverages an external memory structure to cache history attention snapshots during the training stage. To prevent overfitting, we incorporate an innovative memory update scheme, attention temporal moving average, to update the stored external memories with the historical moving average. For inference speedup, we design a prototypical attention learning method to distill the external memory into smaller representative subsets. We evaluate our method on a public histology image dataset and an in-house MRI dataset, demonstrating that MoViT applied to varied medical image analysis tasks, can outperform vanilla transformer models across varied data regimes, especially in cases where only a small amount of annotated data is available. More importantly, MoViT can reach a competitive performance of ViT with only 3.0% of the training data. In conclusion, MoViT provides a simple plug-in for transformer architectures which may contribute to reducing the training data needed to achieve acceptable models for a broad range of medical image analysis tasks.

摘要

由于具有互补优势,Transformer的长距离依赖性与卷积神经网络(CNN)的图像内容局部表示的协同作用,催生了先进的架构,并提高了各种医学图像分析任务的性能。然而,与CNN相比,Transformer由于参数数量较多且缺乏归纳偏差,需要更多的训练数据。对越来越大的数据集的需求仍然是个问题,特别是在医学成像领域,注释工作和数据保护导致数据可用性有限。在这项工作中,受人类将新“证据”与先前记忆的“经验”相关联的决策过程启发,我们提出了一种记忆视觉Transformer(MoViT),以缓解成功训练和部署基于Transformer的架构对大规模数据集的需求。MoViT利用外部内存结构在训练阶段缓存历史注意力快照。为了防止过拟合,我们采用了一种创新的内存更新方案,即注意力时间移动平均,用历史移动平均来更新存储的外部内存。为了加快推理速度,我们设计了一种原型注意力学习方法,将外部内存提炼成更小的代表性子集。我们在一个公共组织学图像数据集和一个内部MRI数据集上评估了我们的方法,证明了MoViT应用于各种医学图像分析任务时,在各种数据情况下都能优于普通Transformer模型,特别是在只有少量注释数据可用的情况下。更重要的是,MoViT仅用3.0%的训练数据就能达到与ViT相当的性能。总之,MoViT为Transformer架构提供了一个简单的插件,这可能有助于减少为广泛的医学图像分析任务获得可接受模型所需的训练数据。

相似文献

1
MoViT: Memorizing Vision Transformers for Medical Image Analysis.MoViT:用于医学图像分析的记忆视觉Transformer
Mach Learn Med Imaging. 2024;14349:205-213. doi: 10.1007/978-3-031-45676-3_21. Epub 2023 Oct 15.

引用本文的文献

2
FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images.FastSAM3D:一种用于3D体医学图像的高效图像分割模型。
Med Image Comput Comput Assist Interv. 2024 Oct;15012:542-552. doi: 10.1007/978-3-031-72390-2_51. Epub 2024 Oct 23.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验