Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:1444-1447. doi: 10.1109/EMBC48229.2022.9871372.
It is generally believed that vision transformers (ViTs) require a huge amount of data to generalize well, which limits their adoption. The introduction of data-efficient algorithms such as data-efficient image transformers (DeiT) provided an opportunity to explore the application of ViTs in medical imaging, where data scarcity is a limiting factor. In this work, we investigated the possibility of using pure transformers for the task of chest x-ray abnormality detection on a small dataset. Our proposed framework is built on a DeiT structure benefiting from a teacher-student scheme for training, with a DenseNet with strong classification performance as the teacher and an adapted ViT as the student. The results show that the performance of transformers is on par with that of convolutional neural networks (CNNs). We achieved a test accuracy of 92.2% for the task of classifying chest x-ray images (normal/pneumonia/COVID-19) on a carefully selected dataset using pure transformers. The results show the capability of transformers to accompany or replace CNNs for achieving state-of-the-art in medical imaging applications. The code and models of this work are available at https://github.com/Ouantimb-Lab/DeiTCovid.
人们普遍认为,视觉转换器(ViT)需要大量的数据才能很好地泛化,这限制了它们的应用。数据高效算法的引入,如数据高效图像转换器(DeiT),为探索 ViT 在医学成像中的应用提供了机会,在医学成像中,数据稀缺是一个限制因素。在这项工作中,我们研究了在小数据集上使用纯转换器进行胸部 X 射线异常检测任务的可能性。我们提出的框架基于 DeiT 结构,受益于一种教师-学生的训练方案,其中具有强分类性能的 DenseNet 作为教师,以及一个经过改编的 ViT 作为学生。结果表明,转换器的性能与卷积神经网络(CNNs)相当。我们在一个精心挑选的数据集上使用纯转换器实现了对胸部 X 射线图像(正常/肺炎/COVID-19)进行分类的任务,测试准确率达到了 92.2%。结果表明,转换器具有与 CNN 一起或替代 CNN 以实现医学成像应用中最新技术的能力。这项工作的代码和模型可在 https://github.com/Ouantimb-Lab/DeiTCovid 上获得。