Thomas Kevin A, Kidziński Łukasz, Halilaj Eni, Fleming Scott L, Venkataraman Guhan R, Oei Edwin H G, Gold Garry E, Delp Scott L
Departments of Biomedical Data Science (K.A.T., S.L.F., G.R.V.), Bioengineering (Ł.K., S.L.D.), and Radiology (G.E.G.), Stanford University, Clark Center, 318 Campus Dr, Room S321, Stanford, CA 94305; Department of Radiology, Erasmus University Rotterdam, Rotterdam, the Netherlands (E.H.G.O.); and Department of Mechanical Engineering, Carnegie Mellon University, Pittsburgh, Pa (E.H.).
Radiol Artif Intell. 2020 Mar 18;2(2):e190065. doi: 10.1148/ryai.2020190065.
To develop an automated model for staging knee osteoarthritis severity from radiographs and to compare its performance to that of musculoskeletal radiologists.
Radiographs from the Osteoarthritis Initiative staged by a radiologist committee using the Kellgren-Lawrence (KL) system were used. Before using the images as input to a convolutional neural network model, they were standardized and augmented automatically. The model was trained with 32 116 images, tuned with 4074 images, evaluated with a 4090-image test set, and compared to two individual radiologists using a 50-image test subset. Saliency maps were generated to reveal features used by the model to determine KL grades.
With committee scores used as ground truth, the model had an average F1 score of 0.70 and an accuracy of 0.71 for the full test set. For the 50-image subset, the best individual radiologist had an average F1 score of 0.60 and an accuracy of 0.60; the model had an average F1 score of 0.64 and an accuracy of 0.66. Cohen weighted κ between the committee and model was 0.86, comparable to intraexpert repeatability. Saliency maps identified sites of osteophyte formation as influential to predictions.
An end-to-end interpretable model that takes full radiographs as input and predicts KL scores with state-of-the-art accuracy, performs as well as musculoskeletal radiologists, and does not require manual image preprocessing was developed. Saliency maps suggest the model's predictions were based on clinically relevant information. © RSNA, 2020.
开发一种用于根据X线片对膝关节骨关节炎严重程度进行分期的自动化模型,并将其性能与肌肉骨骼放射科医生的性能进行比较。
使用由放射科医生委员会根据Kellgren-Lawrence(KL)系统对骨关节炎倡议组织的X线片进行分期。在将图像用作卷积神经网络模型的输入之前,对其进行了标准化和自动增强。该模型使用32116张图像进行训练,使用4074张图像进行调优,使用4090张图像的测试集进行评估,并与两名放射科医生使用50张图像的测试子集进行比较。生成显著性图以揭示模型用于确定KL分级的特征。
以委员会评分作为金标准,该模型在整个测试集上的平均F1分数为0.70,准确率为0.71。对于50张图像的子集,最佳的放射科医生平均F1分数为0.60,准确率为0.60;该模型平均F1分数为0.64,准确率为0.66。委员会与模型之间的Cohen加权κ为0.86,与专家内部重复性相当。显著性图确定骨赘形成部位对预测有影响。
开发了一种端到端可解释模型,该模型以完整的X线片作为输入,以先进的准确率预测KL评分,其性能与肌肉骨骼放射科医生相当,并且不需要手动图像预处理。显著性图表明该模型的预测基于临床相关信息。 © RSNA,2020年。