From the Department of Radiology and Biomedical Imaging (C.E.v.S., J.H.S., E.O., P.M.J., M.P., S.C.F., T.M.L., V.P.) and Department of Epidemiology and Biostatistics (F.L., M.C.N.), University of California, San Francisco, 185 Berry St, Suite 350, San Francisco, CA 94107; Department of Diagnostic and Interventional Radiology, Technische Universität München, Munich, Germany (C.E.v.S., S.C.F.); Department of Diagnostic and Interventional Radiology, Medical Center-University of Freiburg, Faculty of Medicine, Freiburg, Germany (P.M.J.); and Department of Radiology, University of California Davis Health, Sacramento, Calif (L.N.).
Radiology. 2020 Apr;295(1):136-145. doi: 10.1148/radiol.2020190925. Epub 2020 Feb 4.
Background A multitask deep learning model might be useful in large epidemiologic studies wherein detailed structural assessment of osteoarthritis still relies on expert radiologists' readings. The potential of such a model in clinical routine should be investigated. Purpose To develop a multitask deep learning model for grading radiographic hip osteoarthritis features on radiographs and compare its performance to that of attending-level radiologists. Materials and Methods This retrospective study analyzed hip joints seen on weight-bearing anterior-posterior pelvic radiographs from participants in the Osteoarthritis Initiative (OAI). Participants were recruited from February 2004 to May 2006 for baseline measurements, and follow-up was performed 48 months later. Femoral osteophytes (FOs), acetabular osteophytes (AOs), and joint-space narrowing (JSN) were graded as absent, mild, moderate, or severe according to the Osteoarthritis Research Society International atlas. Subchondral sclerosis and subchondral cysts were graded as present or absent. The participants were split at 80% ( = 3494), 10% ( = 437), and 10% ( = 437) by using split-sample validation into training, validation, and testing sets, respectively. The multitask neural network was based on DenseNet-161, a shared convolutional features extractor trained with multitask loss function. Model performance was evaluated in the internal test set from the OAI and in an external test set by using temporal and geographic validation consisting of routine clinical radiographs. Results A total of 4368 participants (mean age, 61.0 years ± 9.2 [standard deviation]; 2538 women) were evaluated (15 364 hip joints on 7738 weight-bearing anterior-posterior pelvic radiographs). The accuracy of the model for assessing these five features was 86.7% (1333 of 1538) for FOs, 69.9% (1075 of 1538) for AOs, 81.7% (1257 of 1538) for JSN, 95.8% (1473 of 1538) for subchondral sclerosis, and 97.6% (1501 of 1538) for subchondral cysts in the internal test set, and 82.7% (86 of 104) for FOS, 65.4% (68 of 104) for AOs, 80.8% (84 of 104) for JSN, 88.5% (92 of 104) for subchondral sclerosis, and 91.3% (95 of 104) for subchondral cysts in the external test set. Conclusion A multitask deep learning model is a feasible approach to reliably assess radiographic features of hip osteoarthritis. © RSNA, 2020
在大型流行病学研究中,详细的骨关节炎结构评估仍然依赖于专家放射科医生的阅读,因此多任务深度学习模型可能会很有用。应该研究这种模型在临床常规中的潜力。目的:开发一种用于对 X 光片上髋关节骨关节炎特征进行分级的多任务深度学习模型,并比较其性能与主治放射科医生的表现。材料和方法:这项回顾性研究分析了来自骨关节炎倡议(OAI)参与者负重前后骨盆 X 光片中的髋关节。参与者于 2004 年 2 月至 2006 年 5 月招募,进行基线测量,48 个月后进行随访。根据国际骨关节炎研究协会图谱,将股骨骨赘(FOs)、髋臼骨赘(AOs)和关节间隙变窄(JSN)分级为无、轻度、中度或重度。软骨下硬化和软骨下囊肿分别为有或无。通过使用分割样本验证将参与者按 80%(=3494)、10%(=437)和 10%(=437)分割为训练、验证和测试集。基于 DenseNet-161 的多任务神经网络是一个共享卷积特征提取器,使用多任务损失函数进行训练。模型性能在 OAI 的内部测试集和由常规临床 X 光片组成的时间和地理验证的外部测试集中进行评估。结果:共评估了 4368 名参与者(平均年龄 61.0 岁±9.2[标准差];2538 名女性)(7738 张负重前后骨盆 X 光片中的 15364 个髋关节)。该模型评估这 5 个特征的准确性在内部测试集中分别为:FOs 为 86.7%(1333/1538),AOs 为 69.9%(1075/1538),JSN 为 81.7%(1257/1538),软骨下硬化为 95.8%(1473/1538),软骨下囊肿为 97.6%(1501/1538);在外部测试集中分别为:FOs 为 82.7%(86/104),AOs 为 65.4%(68/104),JSN 为 80.8%(84/104),软骨下硬化为 88.5%(92/104),软骨下囊肿为 91.3%(95/104)。结论:多任务深度学习模型是一种可靠评估髋关节骨关节炎放射学特征的可行方法。