Suppr超能文献

以临床为重点的多队列基准测试作为一种工具,用于对人工智能算法在基本胸部放射分析中的性能进行外部验证。

Clinically focused multi-cohort benchmarking as a tool for external validation of artificial intelligence algorithm performance in basic chest radiography analysis.

机构信息

Department of Radiology, University Hospital, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany.

Comprehensive Pneumology Center, German Center for Lung Research, Munich, Germany.

出版信息

Sci Rep. 2022 Jul 27;12(1):12764. doi: 10.1038/s41598-022-16514-7.

Abstract

Artificial intelligence (AI) algorithms evaluating [supine] chest radiographs ([S]CXRs) have remarkably increased in number recently. Since training and validation are often performed on subsets of the same overall dataset, external validation is mandatory to reproduce results and reveal potential training errors. We applied a multicohort benchmarking to the publicly accessible (S)CXR analyzing AI algorithm CheXNet, comprising three clinically relevant study cohorts which differ in patient positioning ([S]CXRs), the applied reference standards (CT-/[S]CXR-based) and the possibility to also compare algorithm classification with different medical experts' reading performance. The study cohorts include [1] a cohort, characterized by 563 CXRs acquired in the emergency unit that were evaluated by 9 readers (radiologists and non-radiologists) in terms of 4 common pathologies, [2] a collection of 6,248 SCXRs annotated by radiologists in terms of pneumothorax presence, its size and presence of inserted thoracic tube material which allowed for subgroup and confounding bias analysis and [3] a cohort consisting of 166 patients with SCXRs that were evaluated by radiologists for underlying causes of basal lung opacities, all of those cases having been correlated to a timely acquired computed tomography scan (SCXR and CT within < 90 min). CheXNet non-significantly exceeded the radiology resident (RR) consensus in the detection of suspicious lung nodules (cohort [1], AUC AI/RR: 0.851/0.839, p = 0.793) and the radiological readers in the detection of basal pneumonia (cohort [3], AUC AI/reader consensus: 0.825/0.782, p = 0.390) and basal pleural effusion (cohort [3], AUC AI/reader consensus: 0.762/0.710, p = 0.336) in SCXR, partly with AUC values higher than originally published ("Nodule": 0.780, "Infiltration": 0.735, "Effusion": 0.864). The classifier "Infiltration" turned out to be very dependent on patient positioning (best in CXR, worst in SCXR). The pneumothorax SCXR cohort [2] revealed poor algorithm performance in CXRs without inserted thoracic material and in the detection of small pneumothoraces, which can be explained by a known systematic confounding error in the algorithm training process. The benefit of clinically relevant external validation is demonstrated by the differences in algorithm performance as compared to the original publication. Our multi-cohort benchmarking finally enables the consideration of confounders, different reference standards and patient positioning as well as the AI performance comparison with differentially qualified medical readers.

摘要

人工智能 (AI) 算法在评估仰卧位胸部 X 光片 ([S]CXRs) 方面的应用数量最近显著增加。由于训练和验证通常是在同一总体数据集的子集上进行的,因此必须进行外部验证才能重现结果并揭示潜在的训练错误。我们应用了多队列基准测试来评估可公开访问的 (S)CXR 分析 AI 算法 CheXNet,该算法包含三个在患者体位 ([S]CXRs)、应用的参考标准 (CT-/[S]CXR 基础) 和比较算法分类与不同医学专家阅读性能方面存在差异的临床相关研究队列。这些研究队列包括:

  1. 一个队列,由 563 张在急诊室获得的 CXR 组成,由 9 名读者(放射科医生和非放射科医生)根据 4 种常见病变进行评估;

  2. 一个包含 6248 张 SCXR 的集合,由放射科医生根据气胸的存在、其大小和插入的胸腔管材料进行注释,这允许进行亚组和混杂偏倚分析;

  3. 一个由 166 名患有 SCXR 的患者组成的队列,由放射科医生评估其基础肺部混浊的潜在原因,所有这些病例都与及时获得的计算机断层扫描 (SCXR 和 CT 在 <90 分钟内) 相关联。

CheXNet 在可疑肺结节的检测中并未显著优于放射科住院医师 (RR) 共识 (队列 [1],AUC AI/RR:0.851/0.839,p = 0.793),在基础肺炎的检测中也优于放射科读者共识 (队列 [3],AUC AI/读者共识:0.825/0.782,p = 0.390) 和基础胸腔积液的检测 (队列 [3],AUC AI/读者共识:0.762/0.710,p = 0.336),在 SCXR 中,部分 AUC 值高于原始出版物中的值(“结节”:0.780,“浸润”:0.735,“积液”:0.864)。分类器“浸润”在患者体位方面表现出很强的依赖性(在 CXR 中最好,在 SCXR 中最差)。气胸 SCXR 队列 [2] 显示在没有插入胸腔材料的 CXR 中和在检测小气胸方面的算法性能较差,这可以用算法训练过程中已知的系统混杂误差来解释。通过与原始出版物相比,临床相关外部验证的优势在于算法性能的差异。我们的多队列基准测试最终使我们能够考虑混杂因素、不同的参考标准和患者体位,以及与不同资质的医学读者进行 AI 性能比较。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/866d/9329327/41d2ffe38e0c/41598_2022_16514_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验