Huhtanen Jarno T, Nyman Mikko, Blanco Sequeiros Roberto, Koskinen Seppo K, Pudas Tomi K, Kajander Sami, Niemi Pekka, Aronen Hannu J, Hirvonen Jussi
Faculty of Health and Well-being, Turku University of Applied Sciences, Joukahaisenkatu, Turku, 20520, Finland.
Department of Radiology, University of Turku, Turku, Finland.
Emerg Radiol. 2025 Jun 9. doi: 10.1007/s10140-025-02353-2.
Missed fractures are the primary cause of interpretation errors in emergency radiology, and artificial intelligence has recently shown great promise in radiograph interpretation. This study compared the diagnostic performance of two AI algorithms, BoneView and RBfracture, in detecting traumatic abnormalities (fractures and dislocations) in MSK radiographs.
AI algorithms analyzed 998 radiographs (585 normal, 413 abnormal), against the consensus of two MSK specialists. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and interobserver agreement (Cohen's Kappa) were calculated. 95% confidence intervals (CI) assessed robustness, and McNemar's tests compared sensitivity and specificity between the AI algorithms.
BoneView demonstrated a sensitivity of 0.893 (95% CI: 0.860-0.920), specificity of 0.885 (95% CI: 0.857-0.909), PPV of 0.846, NPV of 0.922, and accuracy of 0.889. RBfracture demonstrated a sensitivity of 0.872 (95% CI: 0.836-0.901), specificity of 0.892 (95% CI: 0.865-0.915), PPV of 0.851, NPV of 0.908, and accuracy of 0.884. No statistically significant differences were found in sensitivity (p = 0.151) or specificity (p = 0.708). Kappa was 0.81 (95% CI: 0.77-0.84), indicating almost perfect agreement between the two AI algorithms. Performance was similar in adults and children. Both AI algorithms struggled more with subtle abnormalities, which constituted 66% and 70% of false negatives but only 20% and 18% of true positives for the two AI algorithms, respectively (p < 0.001).
BoneView and RBfracture exhibited high diagnostic performance and almost perfect agreement, with consistent results across adults and children, highlighting the potential of AI in emergency radiograph interpretation.
漏诊骨折是急诊放射学中解读错误的主要原因,近年来人工智能在X光片解读方面显示出巨大潜力。本研究比较了两种人工智能算法BoneView和RBfracture在检测肌肉骨骼X光片中创伤性异常(骨折和脱位)方面的诊断性能。
人工智能算法针对两位肌肉骨骼专家的共识,对998张X光片(585张正常,413张异常)进行分析。计算敏感性、特异性、阳性预测值(PPV)、阴性预测值(NPV)、准确性和观察者间一致性(科恩kappa系数)。95%置信区间(CI)评估稳健性,McNemar检验比较两种人工智能算法之间的敏感性和特异性。
BoneView的敏感性为0.893(95%CI:0.860 - 0.920),特异性为0.885(95%CI:0.857 - 0.909),PPV为0.846,NPV为0.922,准确性为0.889。RBfracture的敏感性为0.872(95%CI:0.836 - 0.901),特异性为0.892(95%CI:0.865 - 0.915),PPV为0.851,NPV为0.908,准确性为0.884。在敏感性(p = 0.151)或特异性(p = 0.708)方面未发现统计学上的显著差异。Kappa系数为0.81(95%CI:0.77 - 0.84),表明两种人工智能算法之间几乎完全一致。成人和儿童的表现相似。两种人工智能算法在检测细微异常方面都更困难,细微异常分别占两种人工智能算法假阴性的66%和70%,但仅占真阳性的20%和18%(p < 0.001)。
BoneView和RBfracture表现出较高的诊断性能且几乎完全一致,在成人和儿童中结果一致,突出了人工智能在急诊X光片解读中的潜力。