Department of Radiology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany.
X-Ray Products, Siemens Healthineers, Forchheim, Germany.
Eur Radiol. 2021 Oct;31(10):7888-7900. doi: 10.1007/s00330-021-07833-w. Epub 2021 Mar 27.
Diagnostic accuracy of artificial intelligence (AI) pneumothorax (PTX) detection in chest radiographs (CXR) is limited by the noisy annotation quality of public training data and confounding thoracic tubes (TT). We hypothesize that in-image annotations of the dehiscent visceral pleura for algorithm training boosts algorithm's performance and suppresses confounders.
Our single-center evaluation cohort of 3062 supine CXRs includes 760 PTX-positive cases with radiological annotations of PTX size and inserted TTs. Three step-by-step improved algorithms (differing in algorithm architecture, training data from public datasets/clinical sites, and in-image annotations included in algorithm training) were characterized by area under the receiver operating characteristics (AUROC) in detailed subgroup analyses and referenced to the well-established "CheXNet" algorithm.
Performances of established algorithms exclusively trained on publicly available data without in-image annotations are limited to AUROCs of 0.778 and strongly biased towards TTs that can completely eliminate algorithm's discriminative power in individual subgroups. Contrarily, our final "algorithm 2" which was trained on a lower number of images but additionally with in-image annotations of the dehiscent pleura achieved an overall AUROC of 0.877 for unilateral PTX detection with a significantly reduced TT-related confounding bias.
We demonstrated strong limitations of an established PTX-detecting AI algorithm that can be significantly reduced by designing an AI system capable of learning to both classify and localize PTX. Our results are aimed at drawing attention to the necessity of high-quality in-image localization in training data to reduce the risks of unintentionally biasing the training process of pathology-detecting AI algorithms.
• Established pneumothorax-detecting artificial intelligence algorithms trained on public training data are strongly limited and biased by confounding thoracic tubes. • We used high-quality in-image annotated training data to effectively boost algorithm performance and suppress the impact of confounding thoracic tubes. • Based on our results, we hypothesize that even hidden confounders might be effectively addressed by in-image annotations of pathology-related image features.
人工智能(AI)在胸部 X 光片(CXR)中检测气胸(PTX)的诊断准确性受到公共训练数据中嘈杂注释质量和混杂性胸管(TT)的限制。我们假设在算法训练中对裂开的内脏胸膜进行图像内注释可以提高算法的性能并抑制混杂因素。
我们的单中心评估队列包括 3062 例仰卧位 CXR,其中 760 例 PTX 阳性病例具有 PTX 大小和插入 TT 的放射学注释。通过详细的亚组分析,对三个逐步改进的算法(在算法架构、来自公共数据集/临床站点的训练数据以及包括在算法训练中的图像内注释方面有所不同)进行了特征描述,并与成熟的“CheXNet”算法进行了比较。
仅在没有图像内注释的公共可用数据上进行训练的现有算法的性能仅限于 0.778 的 AUROC,并且强烈偏向于 TT,这可以完全消除算法在各个亚组中的区分能力。相反,我们的最终“算法 2”是在数量较少的图像上进行训练的,但另外还对裂开的胸膜进行了图像内注释,用于单侧 PTX 检测的整体 AUROC 为 0.877,显著降低了 TT 相关混杂偏倚。
我们证明了现有的 PTX 检测 AI 算法存在很强的局限性,通过设计能够学习分类和定位 PTX 的 AI 系统,可以显著减少这些局限性。我们的结果旨在引起人们对训练数据中高质量图像内定位的必要性的关注,以降低无意影响病理检测 AI 算法训练过程的风险。
在公共训练数据上训练的现有的气胸检测人工智能算法受到混杂性胸管的限制和偏差影响很大。
我们使用高质量的图像内注释训练数据来有效提高算法性能并抑制混杂性胸管的影响。
根据我们的结果,我们假设通过对与病理学相关的图像特征进行图像内注释,甚至可以有效解决隐藏的混杂因素。