Rajaraman Sivaramakrishnan, Zamzmi Ghada, Folio Les R, Antani Sameer
Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States.
Moffitt Cancer Center, Tampa, FL, United States.
Front Genet. 2022 Feb 24;13:864724. doi: 10.3389/fgene.2022.864724. eCollection 2022.
Research on detecting Tuberculosis (TB) findings on chest radiographs (or Chest X-rays: CXR) using convolutional neural networks (CNNs) has demonstrated superior performance due to the emergence of publicly available, large-scale datasets with expert annotations and availability of scalable computational resources. However, these studies use only the frontal CXR projections, i.e., the posterior-anterior (PA), and the anterior-posterior (AP) views for analysis and decision-making. Lateral CXRs which are heretofore not studied help detect clinically suspected pulmonary TB, particularly in children. Further, Vision Transformers (ViTs) with built-in self-attention mechanisms have recently emerged as a viable alternative to the traditional CNNs. Although ViTs demonstrated notable performance in several medical image analysis tasks, potential limitations exist in terms of performance and computational efficiency, between the CNN and ViT models, necessitating a comprehensive analysis to select appropriate models for the problem under study. This study aims to detect TB-consistent findings in lateral CXRs by constructing an ensemble of the CNN and ViT models. Several models are trained on lateral CXR data extracted from two large public collections to transfer modality-specific knowledge and fine-tune them for detecting findings consistent with TB. We observed that the weighted averaging ensemble of the predictions of CNN and ViT models using the optimal weights computed with the Sequential Least-Squares Quadratic Programming method delivered significantly superior performance (MCC: 0.8136, 95% confidence intervals (CI): 0.7394, 0.8878, < 0.05) compared to the individual models and other ensembles. We also interpreted the decisions of CNN and ViT models using class-selective relevance maps and attention maps, respectively, and combined them to highlight the discriminative image regions contributing to the final output. We observed that (i) the model accuracy is not related to disease region of interest (ROI) localization and (ii) the bitwise-AND of the heatmaps of the top-2-performing models delivered significantly superior ROI localization performance in terms of mean average precision [mAP@(0.1 0.6) = 0.1820, 95% CI: 0.0771,0.2869, < 0.05], compared to other individual models and ensembles. The code is available at https://github.com/sivaramakrishnan-rajaraman/Ensemble-of-CNN-and-ViT-for-TB-detection-in-lateral-CXR.
利用卷积神经网络(CNN)在胸部X光片(CXR)上检测肺结核(TB)结果的研究,由于出现了带有专家注释的公开可用大规模数据集以及可扩展计算资源的可用性,已展现出卓越的性能。然而,这些研究仅使用正面CXR投影,即后前位(PA)和前后位(AP)视图进行分析和决策。此前未被研究的侧位CXR有助于检测临床疑似肺结核,尤其是在儿童中。此外,具有内置自注意力机制的视觉Transformer(ViT)最近已成为传统CNN的可行替代方案。尽管ViT在多个医学图像分析任务中表现出显著性能,但在性能和计算效率方面,CNN和ViT模型之间存在潜在局限性,因此需要进行全面分析,以选择适合所研究问题的模型。本研究旨在通过构建CNN和ViT模型的集成来检测侧位CXR中与TB一致的结果。在从两个大型公共数据集中提取的侧位CXR数据上训练了多个模型,以转移特定模态的知识并对其进行微调,以检测与TB一致的结果。我们观察到,使用顺序最小二乘二次规划方法计算的最优权重对CNN和ViT模型的预测进行加权平均集成,与单个模型和其他集成相比,具有显著优越的性能(MCC:0.8136,95%置信区间(CI):0.7394,0.8878,<0.05)。我们还分别使用类别选择性相关图和注意力图来解释CNN和ViT模型的决策,并将它们结合起来突出对最终输出有贡献的判别性图像区域。我们观察到:(i)模型准确性与感兴趣的疾病区域(ROI)定位无关;(ii)在平均平均精度方面,表现最佳的两个模型的热图按位与运算在ROI定位性能上显著优越[mAP@(0.1 0.6)=0.1820,95% CI:0.0771,0.2869,<0.05],与其他单个模型和集成相比。代码可在https://github.com/sivaramakrishnan-rajaraman/Ensemble-of-CNN-and-ViT-for-TB-detection-in-lateral-CXR获取。