Division of Orthopaedic Surgery, Department of Surgery, University of Toronto, Toronto, Ontario, Canada.
Institute of Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada.
JAMA Netw Open. 2023 Mar 1;6(3):e233391. doi: 10.1001/jamanetworkopen.2023.3391.
Artificial intelligence (AI) enables powerful models for establishment of clinical diagnostic and prognostic tools for hip fractures; however the performance and potential impact of these newly developed algorithms are currently unknown.
To evaluate the performance of AI algorithms designed to diagnose hip fractures on radiographs and predict postoperative clinical outcomes following hip fracture surgery relative to current practices.
A systematic review of the literature was performed using the MEDLINE, Embase, and Cochrane Library databases for all articles published from database inception to January 23, 2023. A manual reference search of included articles was also undertaken to identify any additional relevant articles.
Studies developing machine learning (ML) models for the diagnosis of hip fractures from hip or pelvic radiographs or to predict any postoperative patient outcome following hip fracture surgery were included.
This study followed the Preferred Reporting Items for Systematic Reviews and Meta-analyses and was registered with PROSPERO. Eligible full-text articles were evaluated and relevant data extracted independently using a template data extraction form. For studies that predicted postoperative outcomes, the performance of traditional predictive statistical models, either multivariable logistic or linear regression, was recorded and compared with the performance of the best ML model on the same out-of-sample data set.
Diagnostic accuracy of AI models was compared with the diagnostic accuracy of expert clinicians using odds ratios (ORs) with 95% CIs. Areas under the curve for postoperative outcome prediction between traditional statistical models (multivariable linear or logistic regression) and ML models were compared.
Of 39 studies that met all criteria and were included in this analysis, 18 (46.2%) used AI models to diagnose hip fractures on plain radiographs and 21 (53.8%) used AI models to predict patient outcomes following hip fracture surgery. A total of 39 598 plain radiographs and 714 939 hip fractures were used for training, validating, and testing ML models specific to diagnosis and postoperative outcome prediction, respectively. Mortality and length of hospital stay were the most predicted outcomes. On pooled data analysis, compared with clinicians, the OR for diagnostic error of ML models was 0.79 (95% CI, 0.48-1.31; P = .36; I2 = 60%) for hip fracture radiographs. For the ML models, the mean (SD) sensitivity was 89.3% (8.5%), specificity was 87.5% (9.9%), and F1 score was 0.90 (0.06). The mean area under the curve for mortality prediction was 0.84 with ML models compared with 0.79 for alternative controls (P = .09).
The findings of this systematic review and meta-analysis suggest that the potential applications of AI to aid with diagnosis from hip radiographs are promising. The performance of AI in diagnosing hip fractures was comparable with that of expert radiologists and surgeons. However, current implementations of AI for outcome prediction do not seem to provide substantial benefit over traditional multivariable predictive statistics.
人工智能 (AI) 可用于建立髋部骨折的临床诊断和预后工具的强大模型;然而,这些新开发的算法的性能和潜在影响目前尚不清楚。
评估旨在通过 X 光片诊断髋部骨折并预测髋部骨折手术后临床结果的 AI 算法的性能,与当前实践相比。
使用 MEDLINE、Embase 和 Cochrane 图书馆数据库对从数据库成立到 2023 年 1 月 23 日发表的所有文章进行了系统评价文献检索。还对纳入的文章进行了手动参考文献搜索,以确定任何其他相关文章。
纳入了开发用于从髋部或骨盆 X 光片诊断髋部骨折或预测髋部骨折手术后任何术后患者结局的机器学习 (ML) 模型的研究。
本研究遵循系统评价和荟萃分析的首选报告项目,并在 PROSPERO 中进行了注册。评估合格的全文文章,并使用模板数据提取表独立提取相关数据。对于预测术后结局的研究,记录了传统预测统计模型(多变量逻辑或线性回归)的性能,并将其与相同样本外数据集上最佳 ML 模型的性能进行了比较。
使用优势比 (OR) 和 95%置信区间 (CI) 比较 AI 模型与专家临床医生的诊断准确性。比较了术后结局预测的传统统计模型(多变量线性或逻辑回归)和 ML 模型之间的曲线下面积。
在符合所有标准并纳入本分析的 39 项研究中,18 项(46.2%)使用 AI 模型来诊断 X 光片上的髋部骨折,21 项(53.8%)使用 AI 模型来预测髋部骨折手术后的患者结局。分别使用 39598 张 X 光片和 714939 个髋部骨折来训练、验证和测试特定于诊断和术后结局预测的 ML 模型。死亡率和住院时间是预测最多的结果。在汇总数据分析中,与临床医生相比,ML 模型的髋部骨折 X 光片诊断错误的 OR 为 0.79(95%CI,0.48-1.31;P=0.36;I2=60%)。对于 ML 模型,平均(SD)敏感性为 89.3%(8.5%),特异性为 87.5%(9.9%),F1 评分为 0.90(0.06)。使用 ML 模型预测死亡率的平均曲线下面积为 0.84,而替代对照为 0.79(P=0.09)。
本系统评价和荟萃分析的结果表明,人工智能在帮助从髋部 X 光片中诊断的潜在应用是有希望的。AI 在诊断髋部骨折方面的性能与专家放射科医生和外科医生相当。然而,当前实施 AI 进行结局预测似乎并没有提供比传统多变量预测统计学更大的优势。