Li Fangyun, Zhou Lingxiao, Wang Yunpeng, Chen Chuan, Yang Shuyi, Shan Fei, Liu Lei
Institute of Biomedical Sciences, Fudan University, Shanghai, China.
Institute of Microscale Optoelectronics, Shenzhen University, Shenzhen, China.
Quant Imaging Med Surg. 2022 Jun;12(6):3364-3378. doi: 10.21037/qims-21-1117.
Computer-aided diagnosis based on chest X-ray (CXR) is an exponentially growing field of research owing to the development of deep learning, especially convolutional neural networks (CNNs). However, due to the intrinsic locality of convolution operations, CNNs cannot model long-range dependencies. Although vision transformers (ViTs) have recently been proposed to alleviate this limitation, those trained on patches cannot learn any dependencies for inter-patch pixels and thus, are insufficient for medical image detection. To address this problem, in this paper, we propose a CXR detection method which integrates CNN with a ViT for modeling patch-wise and inter-patch dependencies.
We experimented on the ChestX-ray14 dataset and followed the official training-test set split. Because the training data only had global annotations, the detection network was weakly supervised. A DenseNet with a feature pyramid structure was designed and integrated with an adaptive ViT to model inter-patch and patch-wise long-range dependencies and obtain fine-grained feature maps. We compared the performance using our method with that of other disease detection methods.
For disease classification, our method achieved the best result among all the disease detection methods, with a mean area under the curve (AUC) of 0.829. For lesion localization, our method achieved significantly higher intersection of the union (IoU) scores on the test images with bounding box annotations than did the other detection methods. The visualized results showed that our predictions were more accurate and detailed. Furthermore, evaluation of our method in an external validation dataset demonstrated its generalization ability.
Our proposed method achieves the new state of the art for thoracic disease classification and weakly supervised localization. It has potential to assist in clinical decision-making.
由于深度学习尤其是卷积神经网络(CNN)的发展,基于胸部X线(CXR)的计算机辅助诊断成为一个呈指数级增长的研究领域。然而,由于卷积操作的内在局部性,CNN无法对长程依赖关系进行建模。尽管最近提出了视觉Transformer(ViT)来缓解这一限制,但在图像块上训练的ViT无法学习图像块间像素的任何依赖关系,因此不足以用于医学图像检测。为了解决这个问题,在本文中,我们提出了一种CXR检测方法,该方法将CNN与ViT相结合,用于对图像块内和图像块间的依赖关系进行建模。
我们在ChestX-ray14数据集上进行实验,并遵循官方的训练-测试集划分。由于训练数据仅具有全局标注,检测网络是弱监督的。设计了一个具有特征金字塔结构的DenseNet,并将其与自适应ViT集成,以对图像块间和图像块内的长程依赖关系进行建模,并获得细粒度特征图。我们将我们的方法与其他疾病检测方法的性能进行了比较。
对于疾病分类,我们的方法在所有疾病检测方法中取得了最佳结果,曲线下面积(AUC)均值为0.829。对于病变定位,在带有边界框标注的测试图像上,我们的方法比其他检测方法获得了显著更高的交并比(IoU)分数。可视化结果表明我们的预测更加准确和详细。此外,在外部验证数据集中对我们的方法进行评估证明了其泛化能力。
我们提出的方法在胸部疾病分类和弱监督定位方面达到了新的技术水平。它有潜力辅助临床决策。