IEViT：一种用于胸部 X 射线图像分类的增强型视觉Transformer 架构。

IEViT: An enhanced vision transformer architecture for chest X-ray image classification.

机构信息

University of the West of Scotland, High St., Paisley, PA1 2BE, UK.

Durham University, Stockton Road, Durham, DH1 3LE, UK.

出版信息

Comput Methods Programs Biomed. 2022 Nov;226:107141. doi: 10.1016/j.cmpb.2022.107141. Epub 2022 Sep 16.

DOI:10.1016/j.cmpb.2022.107141

PMID:36162246

Abstract

BACKGROUND AND OBJECTIVE

Chest X-ray imaging is a relatively cheap and accessible diagnostic tool that can assist in the diagnosis of various conditions, including pneumonia, tuberculosis, COVID-19, and others. However, the requirement for expert radiologists to view and interpret chest X-ray images can be a bottleneck, especially in remote and deprived areas. Recent advances in machine learning have made possible the automated diagnosis of chest X-ray scans. In this work, we examine the use of a novel Transformer-based deep learning model for the task of chest X-ray image classification.

METHODS

We first examine the performance of the Vision Transformer (ViT) state-of-the-art image classification machine learning model for the task of chest X-ray image classification, and then propose and evaluate the Input Enhanced Vision Transformer (IEViT), a novel enhanced Vision Transformer model that can achieve improved performance on chest X-ray images associated with various pathologies.

RESULTS

Experiments on four chest X-ray image data sets containing various pathologies (tuberculosis, pneumonia, COVID-19) demonstrated that the proposed IEViT model outperformed ViT for all the data sets and variants examined, achieving an F1-score between 96.39% and 100%, and an improvement over ViT of up to +5.82% in terms of F1-score across the four examined data sets. IEViT's maximum sensitivity (recall) ranged between 93.50% and 100% across the four data sets, with an improvement over ViT of up to +3%, whereas IEViT's maximum precision ranged between 97.96% and 100% across the four data sets, with an improvement over ViT of up to +6.41%.

CONCLUSIONS

Results showed that the proposed IEViT model outperformed all ViT's variants for all the examined chest X-ray image data sets, demonstrating its superiority and generalisation ability. Given the relatively low cost and the widespread accessibility of chest X-ray imaging, the use of the proposed IEViT model can potentially offer a powerful, but relatively cheap and accessible method for assisting diagnosis using chest X-ray images.

摘要

背景与目的

胸部 X 光成像作为一种相对廉价且易于获取的诊断工具，可以辅助诊断各种疾病，包括肺炎、肺结核、COVID-19 等。然而，需要专业放射科医生来查看和解释胸部 X 光图像，这可能会成为一个瓶颈，尤其是在偏远和贫困地区。最近，机器学习的进步使得胸部 X 光扫描的自动诊断成为可能。在这项工作中，我们研究了一种新的基于转换器的深度学习模型在胸部 X 光图像分类任务中的应用。

方法

我们首先检查了视觉转换器（ViT）最先进的图像分类机器学习模型在胸部 X 光图像分类任务中的性能，然后提出并评估了输入增强视觉转换器（IEViT），这是一种新的增强视觉转换器模型，能够在与各种病理相关的胸部 X 光图像上实现更好的性能。

结果

在包含各种病理（肺结核、肺炎、COVID-19）的四个胸部 X 光图像数据集上的实验表明，所提出的 IEViT 模型在所有检查的数据集中都优于 ViT，在四个检查的数据集中，F1 得分在 96.39%到 100%之间，F1 得分提高了+5.82%。IEViT 的最大灵敏度（召回率）在四个数据集之间的范围为 93.50%到 100%，与 ViT 相比提高了+3%，而 IEViT 的最大精度在四个数据集之间的范围为 97.96%到 100%，与 ViT 相比提高了+6.41%。