Suppr超能文献

亚组必要性:胸部X光片分类器在患者、环境和病理亚组中的泛化差距

The Subgroup Imperative: Chest Radiograph Classifier Generalization Gaps in Patient, Setting, and Pathology Subgroups.

作者信息

Ahluwalia Monish, Abdalla Mohamed, Sanayei James, Seyyed-Kalantari Laleh, Hussain Mohannad, Ali Amna, Fine Benjamin

机构信息

From the Kingston Health Sciences Centre, Queen's University, Kingston, Ontario, Canada (M. Ahluwalia); Faculty of Medicine (M. Ahluwalia, J.S.), Institute of Health Policy, Management and Evaluation (M. Ahluwalia), Department of Computer Science (M. Abdalla, L.S.K.), and Department of Medical Imaging (B.F.), University of Toronto, Toronto, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Canada (M. Abdalla, B.F.); Institute for Better Health (M. Abdalla, A.A., B.F.) and Department of Diagnostic Imaging (A.A., B.F.), Trillium Health Partners, 100 Queensway West, Clinical Administrative Building, 6th Floor, Mississauga, ON, Canada L5B 1B8; Department of Medicine, Royal University Hospital, Saskatoon, Saskatchewan, Canada (J.S.); Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, Canada (L.S.K.); and Techie Maestro, Waterloo, Ontario, Canada (M.H.).

出版信息

Radiol Artif Intell. 2023 Jul 12;5(5):e220270. doi: 10.1148/ryai.220270. eCollection 2023 Sep.

Abstract

PURPOSE

To externally test four chest radiograph classifiers on a large, diverse, real-world dataset with robust subgroup analysis.

MATERIALS AND METHODS

In this retrospective study, adult posteroanterior chest radiographs (January 2016-December 2020) and associated radiology reports from Trillium Health Partners in Ontario, Canada, were extracted and de-identified. An open-source natural language processing tool was locally validated and used to generate ground truth labels for the 197 540-image dataset based on the associated radiology report. Four classifiers generated predictions on each chest radiograph. Performance was evaluated using accuracy, positive predictive value, negative predictive value, sensitivity, specificity, F1 score, and Matthews correlation coefficient for the overall dataset and for patient, setting, and pathology subgroups.

RESULTS

Classifiers demonstrated 68%-77% accuracy, 64%-75% sensitivity, and 82%-94% specificity on the external testing dataset. Algorithms showed decreased sensitivity for solitary findings (43%-65%), patients younger than 40 years (27%-39%), and patients in the emergency department (38%-60%) and decreased specificity on normal chest radiographs with support devices (59%-85%). Differences in sex and ancestry represented movements along an algorithm's receiver operating characteristic curve.

CONCLUSION

Performance of deep learning chest radiograph classifiers was subject to patient, setting, and pathology factors, demonstrating that subgroup analysis is necessary to inform implementation and monitor ongoing performance to ensure optimal quality, safety, and equity. Conventional Radiography, Thorax, Ethics, Supervised Learning, Convolutional Neural Network (CNN), Machine Learning Algorithms © RSNA, 2023See also the commentary by Huisman and Hannink in this issue.

摘要

目的

在一个大型、多样的真实世界数据集上对四种胸部X光分类器进行外部测试,并进行稳健的亚组分析。

材料与方法

在这项回顾性研究中,提取了加拿大安大略省翠菊健康伙伴组织2016年1月至2020年12月的成人后前位胸部X光片及相关放射学报告,并对其进行去识别处理。一种开源自然语言处理工具在本地得到验证,并用于根据相关放射学报告为197540张图像的数据集生成真实标签。四种分类器对每张胸部X光片进行预测。使用总体数据集以及患者、检查地点和病理亚组的准确率、阳性预测值、阴性预测值、灵敏度、特异度、F1分数和马修斯相关系数来评估性能。

结果

分类器在外部测试数据集上的准确率为68%-77%,灵敏度为64%-75%,特异度为82%-94%。算法对孤立性病变(43%-65%)、40岁以下患者(27%-39%)和急诊科患者(38%-60%)的灵敏度降低,对带有支撑装置的正常胸部X光片的特异度降低(59%-85%)。性别和血统的差异表现为算法的接收者操作特征曲线上的移动。

结论

深度学习胸部X光分类器的性能受患者、检查地点和病理因素影响,表明亚组分析对于指导实施和监测持续性能以确保最佳质量、安全性和公平性是必要的。传统放射学、胸部、伦理学、监督学习、卷积神经网络(CNN)、机器学习算法 © RSNA,2023另见本期Huisman和Hannink的评论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1e1/10546359/39d276574f74/ryai.220270.VA.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验