From the Departments of Radiology (D.B.L., M.P.L., S.S.H., C.P.L.), Computer Science (M.C.C.), and Biomedical Informatics (C.P.L.), Stanford University School of Medicine, 300 Pasteur Dr, Stanford, CA 94305-5105; and Department of Radiology, Children's Hospital Colorado, Aurora, Colo (N.V.S.).
Radiology. 2018 Apr;287(1):313-322. doi: 10.1148/radiol.2017170236. Epub 2017 Nov 2.
Purpose To compare the performance of a deep-learning bone age assessment model based on hand radiographs with that of expert radiologists and that of existing automated models. Materials and Methods The institutional review board approved the study. A total of 14 036 clinical hand radiographs and corresponding reports were obtained from two children's hospitals to train and validate the model. For the first test set, composed of 200 examinations, the mean of bone age estimates from the clinical report and three additional human reviewers was used as the reference standard. Overall model performance was assessed by comparing the root mean square (RMS) and mean absolute difference (MAD) between the model estimates and the reference standard bone ages. Ninety-five percent limits of agreement were calculated in a pairwise fashion for all reviewers and the model. The RMS of a second test set composed of 913 examinations from the publicly available Digital Hand Atlas was compared with published reports of an existing automated model. Results The mean difference between bone age estimates of the model and of the reviewers was 0 years, with a mean RMS and MAD of 0.63 and 0.50 years, respectively. The estimates of the model, the clinical report, and the three reviewers were within the 95% limits of agreement. RMS for the Digital Hand Atlas data set was 0.73 years, compared with 0.61 years of a previously reported model. Conclusion A deep-learning convolutional neural network model can estimate skeletal maturity with accuracy similar to that of an expert radiologist and to that of existing automated models. RSNA, 2017 An earlier incorrect version of this article appeared online. This article was corrected on January 19, 2018.
目的 比较手部 X 线片深度学习骨龄评估模型与专家放射科医生和现有自动模型的性能。
材料与方法 本研究经机构审查委员会批准。从两家儿童医院共获得 14 036 例临床手部 X 线片及其相应报告,用于训练和验证模型。对于由 200 例检查组成的第一个测试集,将临床报告和另外 3 位人类审阅者的骨龄估计值的平均值作为参考标准。通过比较模型估计值与参考标准骨龄之间的均方根(RMS)和平均绝对差(MAD)来评估整体模型性能。通过两两比较所有审阅者和模型来计算 95%一致性界限。在公开的 Digital Hand Atlas 中,第二个测试集由 913 例检查组成,对其 RMS 与现有自动模型的已发表报告进行了比较。
结果 模型和审阅者的骨龄估计值之间的平均差异为 0 年,平均 RMS 和 MAD 分别为 0.63 年和 0.50 年。模型、临床报告和 3 位审阅者的估计值均在 95%一致性界限内。Digital Hand Atlas 数据集的 RMS 为 0.73 年,而之前报道的模型为 0.61 年。
结论 深度学习卷积神经网络模型可以准确评估骨骼成熟度,其准确性与专家放射科医生和现有自动模型相当。
RSNA,2017
早期错误版本曾在线发表。本文于 2018 年 1 月 19 日更正。