Department of Radiology, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark; Radiologic Artificial Intelligence Testcenter, Bispebjerg, Frederiksberg, Herlev and Gentofte Hospitals, Copenhagen, Denmark.
Department of Radiology, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark.
Eur J Radiol. 2022 May;150:110249. doi: 10.1016/j.ejrad.2022.110249. Epub 2022 Mar 12.
To externally validate an artificial intelligence (AI) tool for radiographic knee osteoarthritis severity classification on a clinical dataset.
This retrospective, consecutive patient sample, external validation study used weight-bearing, non-fixed-flexion posterior-anterior knee radiographs from a clinical production PACS. The index test was ordinal Kellgren-Lawrence grading by an AI tool, two musculoskeletal radiology consultants, two reporting technologists, and two resident radiologists. Grading was repeated by all readers after at least four weeks. Reference test was the consensus of the two consultants. The primary outcome was quadratic weighted kappa. Secondary outcomes were ordinal weighted accuracy, multiclass accuracy and F1-score.
50 consecutive patients between September 24, 2019 and October 22, 2019 were retrospectively included (3 excluded) totaling 99 knees (1 excluded). Quadratic weighted kappa for the AI tool and the consultant consensus was 0.88 CI95% (0.82-0.92). Agreement between the consultants was 0.89 CI95% (0.85-0.93). Intra-rater agreements for the consultants were 0.96 CI95% (0.94-0.98) and 0.94 CI95% (0.91-0.96) respectively. For the AI tool it was 1 CI95% (1-1). For the AI tool, ordinal weighted accuracy was 97.8% CI95% (96.9-98.6 %). Average multiclass accuracy and F1-score were 84% (83/99) CI95% (77-91%) and 0.67 CI95% (0.51-0.81).
The AI tool achieved the same good-to-excellent agreement with the radiology consultant consensus for radiographic knee osteoarthritis severity classification as the consultants did with each other.
在临床数据集上对用于放射学膝关节骨关节炎严重程度分类的人工智能 (AI) 工具进行外部验证。
本回顾性连续患者样本外部验证研究使用来自临床生产型 PACS 的负重、非固定弯曲后前位膝关节射线照片。 指标测试是 AI 工具、两名肌肉骨骼放射科顾问、两名报告技术员和两名住院放射科医生对 K-L 分级的有序评估。所有读者至少四周后重复分级。参考测试是两位顾问的共识。主要结果是二次加权kappa。次要结果是有序加权准确性、多类准确性和 F1 分数。
2019 年 9 月 24 日至 10 月 22 日期间回顾性纳入 50 例连续患者(3 例排除),共 99 例膝关节(1 例排除)。AI 工具和顾问共识的二次加权 kappa 为 0.88 CI95%(0.82-0.92)。顾问之间的一致性为 0.89 CI95%(0.85-0.93)。顾问的内部评分者协议分别为 0.96 CI95%(0.94-0.98)和 0.94 CI95%(0.91-0.96)。对于 AI 工具,其为 1 CI95%(1-1)。对于 AI 工具,有序加权准确率为 97.8% CI95%(96.9-98.6%)。平均多类准确率和 F1 分数分别为 84%(83/99)CI95%(77-91%)和 0.67 CI95%(0.51-0.81)。
AI 工具在放射学膝关节骨关节炎严重程度分类方面与放射科顾问共识达成了相同的良好到优秀的一致性,而顾问之间的一致性也非常好。