Özmen Evrim, Özen Atalay Hande, Uzer Evren, Veznikli Mert
Koç University Hospital, Department of Radiology, İstanbul, Türkiye.
Koç University Faculty of Medicine, Department of Computational Biology and Biostatistics, İstanbul, Türkiye.
Diagn Interv Radiol. 2024 Sep 2. doi: 10.4274/dir.2024.242790.
This study aimed to evaluate the validity of two artificial intelligence (AI)-based bone age assessment programs, BoneXpert and VUNO Med-Bone Age (VUNO), compared with manual assessments using the Greulich-Pyle method in Turkish children.
This study included a cohort of 292 pediatric cases, ranging in age from 1 to 15 years with an equal gender and number distribution in each age group. Two radiologists, who were unaware of the bone age determined by AI, independently evaluated the bone age. The statistical study involved using the intraclass correlation coefficient (ICC) to measure the level of agreement between the manual and AI-based assessments.
The ICC coefficients for the agreement between the manual measurements of two radiologists indicate almost perfect agreement. When all cases, regardless of gender and age group, were analyzed, an almost perfect positive agreement was observed between the manual and software measurements. When bone age calculations were analyzed separately for boys and girls, no statistically significant differences were found between the two AI-based methods in any subgroup. For boys regardless of age, the ICCs were 0.995 for VUNO and 0.994 for BoneXpert (z = 1.597, = 0.110), while for girls, the ICCs were 0.994 and 0.995, respectively (z = -1.303, = 0.193). The overall agreement with manual measurements was high for both VUNO and BoneXpert. In both boys and girls, the agreement remained consistent across different age groups. These findings indicate that both AI-based bone age assessment tools have a high degree of agreement with manual measurements across all age and gender groups, with no significant superiority of one method over the other.
Both BoneXpert and VUNO demonstrated high validity in assessing bone age, with no statistically significant differences between the two methods across gender or pubertal status groups. Notably, this study represents the first evaluation of both BoneXpert and VUNO for bone age assessment in Turkish children, highlighting their potential as reliable and clinically relevant tools for this population.
Investigating the most suitable AI program for the Turkish population could be clinically significant.
本研究旨在评估两种基于人工智能(AI)的骨龄评估程序,即BoneXpert和VUNO Med-Bone Age(VUNO),与采用格鲁利希-派尔方法对土耳其儿童进行的手工评估相比的有效性。
本研究纳入了292例儿科病例,年龄范围为1至15岁,各年龄组性别和数量分布均衡。两名对AI确定的骨龄不知情的放射科医生独立评估骨龄。统计研究采用组内相关系数(ICC)来衡量手工评估与基于AI的评估之间的一致程度。
两名放射科医生手工测量之间的ICC系数表明几乎完全一致。当分析所有病例,无论性别和年龄组时,手工测量与软件测量之间观察到几乎完全的正相关。当分别对男孩和女孩的骨龄计算进行分析时,在任何亚组中两种基于AI的方法之间均未发现统计学上的显著差异。对于所有年龄段的男孩,VUNO的ICC为0.995,BoneXpert的ICC为0.994(z = 1.597,P = 0.110),而对于女孩,ICC分别为0.994和0.995(z = -1.303,P = 0.193)。VUNO和BoneXpert与手工测量的总体一致性都很高。在男孩和女孩中,不同年龄组的一致性都保持稳定。这些发现表明,两种基于AI的骨龄评估工具在所有年龄和性别组中与手工测量都有高度的一致性,一种方法并不显著优于另一种方法。
BoneXpert和VUNO在评估骨龄方面都显示出高度的有效性,两种方法在性别或青春期状态组之间没有统计学上的显著差异。值得注意的是,本研究是对BoneXpert和VUNO在土耳其儿童骨龄评估中的首次评估,突出了它们作为该人群可靠且具有临床相关性工具的潜力。
研究最适合土耳其人群的AI程序可能具有临床意义。