急诊科患者近端股骨骨折检测深度学习系统的验证与算法审核：一项诊断准确性研究。

Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study.

作者信息

Oakden-Rayner Lauren, Gale William, Bonham Thomas A, Lungren Matthew P, Carneiro Gustavo, Bradley Andrew P, Palmer Lyle J

机构信息

School of Public Health, University of Adelaide, Adelaide, SA, Australia; Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia.

Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia; School of Computer Science, University of Adelaide, Adelaide, SA, Australia.

出版信息

Lancet Digit Health. 2022 May;4(5):e351-e358. doi: 10.1016/S2589-7500(22)00004-8. Epub 2022 Apr 5.

DOI:10.1016/S2589-7500(22)00004-8

PMID:35396184

Abstract

BACKGROUND

Proximal femoral fractures are an important clinical and public health issue associated with substantial morbidity and early mortality. Artificial intelligence might offer improved diagnostic accuracy for these fractures, but typical approaches to testing of artificial intelligence models can underestimate the risks of artificial intelligence-based diagnostic systems.

METHODS

We present a preclinical evaluation of a deep learning model intended to detect proximal femoral fractures in frontal x-ray films in emergency department patients, trained on films from the Royal Adelaide Hospital (Adelaide, SA, Australia). This evaluation included a reader study comparing the performance of the model against five radiologists (three musculoskeletal specialists and two general radiologists) on a dataset of 200 fracture cases and 200 non-fractures (also from the Royal Adelaide Hospital), an external validation study using a dataset obtained from Stanford University Medical Center, CA, USA, and an algorithmic audit to detect any unusual or unexpected model behaviour.

FINDINGS

In the reader study, the area under the receiver operating characteristic curve (AUC) for the performance of the deep learning model was 0·994 (95% CI 0·988-0·999) compared with an AUC of 0·969 (0·960-0·978) for the five radiologists. This strong model performance was maintained on external validation, with an AUC of 0·980 (0·931-1·000). However, the preclinical evaluation identified barriers to safe deployment, including a substantial shift in the model operating point on external validation and an increased error rate on cases with abnormal bones (eg, Paget's disease).

INTERPRETATION

The model outperformed the radiologists tested and maintained performance on external validation, but showed several unexpected limitations during further testing. Thorough preclinical evaluation of artificial intelligence models, including algorithmic auditing, can reveal unexpected and potentially harmful behaviour even in high-performance artificial intelligence systems, which can inform future clinical testing and deployment decisions.

FUNDING

None.

摘要

背景

股骨近端骨折是一个重要的临床和公共卫生问题，与严重的发病率和早期死亡率相关。人工智能可能会提高这些骨折的诊断准确性，但人工智能模型的典型测试方法可能会低估基于人工智能的诊断系统的风险。

方法

我们对一个深度学习模型进行了临床前评估，该模型旨在检测急诊科患者的正位X线片中的股骨近端骨折，使用澳大利亚南澳大利亚州阿德莱德皇家医院的X线片进行训练。该评估包括一项阅片者研究，在一个包含200例骨折病例和200例非骨折病例（同样来自阿德莱德皇家医院）的数据集上，将该模型的表现与五名放射科医生（三名肌肉骨骼专科医生和两名普通放射科医生）的表现进行比较；一项外部验证研究，使用从美国加利福尼亚州斯坦福大学医学中心获得的数据集；以及一项算法审核，以检测模型的任何异常或意外行为。

结果

在阅片者研究中，深度学习模型的受试者工作特征曲线下面积（AUC）为0.994（95%CI 0.988 - 0.999），而五名放射科医生的AUC为0.969（0.960 - 0.978）。在外部验证中，该模型出色的表现得以维持，AUC为0.980（0.931 - 1.000）。然而，临床前评估发现了安全部署的障碍，包括外部验证时模型工作点的大幅偏移，以及骨骼异常（如佩吉特病）病例的错误率增加。

解读

该模型在测试中表现优于放射科医生，并在外部验证中保持了性能，但在进一步测试中显示出一些意想不到的局限性。对人工智能模型进行全面的临床前评估，包括算法审核，即使在高性能的人工智能系统中也能揭示意想不到的潜在有害行为，这可为未来的临床试验和部署决策提供参考。

资金来源

无。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

急诊科患者近端股骨骨折检测深度学习系统的验证与算法审核：一项诊断准确性研究。

Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

FINDINGS

INTERPRETATION

FUNDING

背景

方法

结果

解读

资金来源

相似文献

引用本文的文献

急诊科患者近端股骨骨折检测深度学习系统的验证与算法审核：一项诊断准确性研究。

Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

FINDINGS

INTERPRETATION

FUNDING

背景

方法

结果

解读

资金来源

相似文献

引用本文的文献