Pape Johanna, Rosolowski Maciej, Pfäffle Roland, Beeskow Anne B, Gräfe Daniel
Department of Pediatric Radiology, University Hospital, 04103, Leipzig, Germany.
Institute for Medical Informatics, Statistics and Epidemiology, Leipzig University, 04107, Leipzig, Germany.
Eur Radiol. 2025 Mar;35(3):1190-1196. doi: 10.1007/s00330-024-11169-6. Epub 2024 Nov 5.
To date, AI-supported programs for bone age (BA) determination for medical use in Europe have almost only been validated separately, according to Greulich and Pyle (G&P). Therefore, the current study aimed to compare the performance of three programs, namely BoneXpert, PANDA, and BoneView, on a single Central European population.
For this retrospective study, hand radiographs of 306 children aged 1-18 years, stratified by gender and age, were included. A subgroup consisting of the age group accounting for 90% of examinations in clinical practice was formed. The G&P BA was estimated by three human experts-as ground truth-and three AI-supported programs. The mean absolute deviation, the root mean squared error (RMSE), and dropouts by the AI were calculated.
The correlation between all programs and the ground truth was prominent (R ≥ 0.98). In the total group, BoneXpert had a lower RMSE than BoneView and PANDA (0.62 vs. 0.65 and 0.75 years) with a dropout rate of 2.3%, 20.3% and 0%, respectively. In the subgroup, there was less difference in RMSE (0.66 vs. 0.68 and 0.65 years, max. 4% dropouts). The standard deviation between the AI readers was lower than that between the human readers (0.54 vs. 0.62 years, p < 0.01).
All three AI programs predict BA after G&P in the main age range with similar high reliability. Differences arise at the boundaries of childhood.
Question There is a lack of comparative, independent validation for artificial intelligence-based bone age estimation in children. Findings Three commercially available programs estimate bone age after Greulich and Pyle with similarly high reliability in a central European cohort. Clinical relevance The comparative study will help the reader choose a software for bone age estimation approved for the European market depending on the targeted age group and economic considerations.
迄今为止,在欧洲,用于医学用途的人工智能支持的骨龄(BA)测定程序几乎仅根据格吕利希和派尔(G&P)标准分别进行了验证。因此,本研究旨在比较BoneXpert、PANDA和BoneView这三个程序在单一中欧人群中的性能。
在这项回顾性研究中,纳入了306名1至18岁儿童的手部X光片,并按性别和年龄分层。形成了一个由占临床实践中90%检查的年龄组组成的亚组。由三位医学专家估计G&P骨龄作为对照标准,并与三个人工智能支持的程序进行比较。计算了平均绝对偏差、均方根误差(RMSE)以及人工智能程序的漏检率。
所有程序与对照标准之间的相关性都很显著(R≥0.98)。在总体组中,BoneXpert的RMSE低于BoneView和PANDA(分别为0.62岁、0.65岁和0.75岁),漏检率分别为2.3%、20.3%和0%。在亚组中,RMSE差异较小(分别为0.66岁、0.68岁和0.65岁,最大漏检率为4%)。人工智能阅片者之间的标准差低于人类阅片者(分别为0.54岁和0.62岁,p<0.01)。
所有这三个人工智能程序在主要年龄范围内预测G&P骨龄时具有相似的高可靠性。在儿童年龄边界处存在差异。
问题缺乏对儿童基于人工智能的骨龄估计的比较性、独立验证。发现三个商业可用程序在中欧队列中根据格吕利希和派尔标准估计骨龄时具有相似的高可靠性。临床意义这项比较研究将帮助读者根据目标年龄组和经济考虑因素选择一种经欧洲市场批准的骨龄估计软件。