Lotfi Mehrzad, Abolpour Nahid, Ghasemi Mohammadreza, Heydari Hajar, Pourghayumi Reza
Department of Radiology, Medical Imaging Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.
Department of Artificial Intelligence, Shiraz University of Medical Sciences, Shiraz, Iran.
Arch Iran Med. 2025 Apr 1;28(4):198-206. doi: 10.34172/aim.32070.
To investigate whether the bone age (BA) of Iranian children could be accurately assessed via an artificial intelligence (AI) system. Accurate assessment of skeletal maturity is crucial for diagnosing and treating various musculoskeletal disorders, and is traditionally achieved through manual comparison with the Greulich-Pyle atlas. This process, however, is subjective and time-consuming. Recent advances in deep learning offer more efficient and consistent BA evaluations.
From left-hand radiographs of children aged 1-18 years who presented to a tertiary research hospital, 555 radiographs (220 boys and 335 girls) were collected. The reference BA was determined via the Greulich and Pyle (GP) method by two radiologists in consensus. The BA was then estimated to use a deep learning model specifically developed for this population. Model performance was evaluated using multiple metrics: Mean square error (MSE), mean absolute error (MAE), intra-class correlation coefficient (ICC), and 95% limits of agreement (LoA). Gender-specific results were analyzed separately.
The model demonstrated acceptable accuracy. For boys, MSE was 0.55 years, MAE was 0.59 years, ICC was 0.74, and the 95% LoA ranged from -0.8 to 1.2 years. For girls, MSE was 0.59 years, MAE was 0.61 years, ICC was 0.82, and the 95% LoA ranged from -0.6 to 1.0 years. These results indicate stronger predictive accuracy for girls compared to boys.
Our findings demonstrate that the proposed deep learning model achieves reasonable accuracy in BA assessment, with stronger performance in girls compared to boys. However, the relatively wide 95% LoA, particularly for boys, and prediction errors at the extremes of the age range highlight the need for further refinement and validation. While the model shows potential as a supplementary tool for clinicians, future studies should focus on improving prediction accuracy, reducing variability, and validating the model on larger, more diverse datasets before considering widespread clinical implementation. Additionally, addressing edge cases and specific conditions that a human reviewer may detect but the model might overlook, will be essential for enhancing its clinical reliability.
研究人工智能(AI)系统能否准确评估伊朗儿童的骨龄(BA)。准确评估骨骼成熟度对于诊断和治疗各种肌肉骨骼疾病至关重要,传统上是通过与格氏-派氏图谱进行手动比较来实现的。然而,这个过程主观且耗时。深度学习的最新进展提供了更高效、更一致的骨龄评估方法。
从一家三级研究医院就诊的1至18岁儿童的左手X光片中收集了555张X光片(220名男孩和335名女孩)。由两名放射科医生通过格氏和派氏(GP)方法一致确定参考骨龄。然后使用专门为此人群开发的深度学习模型估计骨龄。使用多种指标评估模型性能:均方误差(MSE)、平均绝对误差(MAE)、组内相关系数(ICC)和95%一致性界限(LoA)。分别分析了按性别划分的结果。
该模型显示出可接受的准确性。对于男孩,MSE为0.55岁,MAE为0.59岁,ICC为0.74,95%LoA范围为-0.8至1.2岁。对于女孩,MSE为0.59岁,MAE为0.61岁,ICC为0.82,95%LoA范围为-0.6至1.0岁。这些结果表明女孩的预测准确性比男孩更强。
我们的研究结果表明,所提出的深度学习模型在骨龄评估中达到了合理的准确性,女孩的表现比男孩更强。然而,相对较宽的95%LoA,特别是对于男孩,以及在年龄范围两端的预测误差突出了进一步改进和验证的必要性。虽然该模型显示出作为临床医生辅助工具的潜力,但未来的研究应专注于提高预测准确性、减少变异性,并在考虑广泛临床应用之前在更大、更多样化的数据集中验证该模型。此外,解决人类审阅者可能检测到但模型可能忽略的边缘情况和特定条件,对于提高其临床可靠性至关重要。