Wang Xi, Zhou Bo, Gong Ping, Zhang Ting, Mo Yan, Tang Jie, Shi Xinmiao, Wang Jianhong, Yuan Xinyu, Bai Fengsen, Wang Lei, Xu Qi, Tian Yu, Ha Qing, Huang Chencui, Yu Yizhou, Wang Lin
Department of Child Health Care, Children's Hospital, Capital Institute of Pediatrics, Beijing, China.
Deepwise AI Lab, Beijing, China.
Front Pediatr. 2022 Feb 24;10:818061. doi: 10.3389/fped.2022.818061. eCollection 2022.
The accuracy and consistency of bone age assessments (BAA) using standard methods can vary with physicians' level of experience.
To assess the impact of information from an artificial intelligence (AI) deep learning convolutional neural network (CNN) model on BAA, specialists with different levels of experience (junior, mid-level, and senior) assessed radiographs from 316 children aged 4-18 years that had been randomly divided into two equal sets-group A and group B. Bone age (BA) was assessed independently by each specialist without additional information (group A) and with information from the model (group B). With the mean assessment of four experts as the reference standard, mean absolute error (MAE), and intraclass correlation coefficient (ICC) were calculated to evaluate accuracy and consistency. Individual assessments of 13 bones (radius, ulna, and short bones) were also compared between group A and group B with the rank-sum test.
The accuracies of senior, mid-level, and junior physicians were significantly better (all < 0.001) with AI assistance (MAEs 0.325, 0.344, and 0.370, respectively) than without AI assistance (MAEs 0.403, 0.469, and 0.755, respectively). Moreover, for senior, mid-level, and junior physicians, consistency was significantly higher (all < 0.001) with AI assistance (ICCs 0.996, 0.996, and 0.992, respectively) than without AI assistance (ICCs 0.987, 0.989, and 0.941, respectively). For all levels of experience, accuracy with AI assistance was significantly better than accuracy without AI assistance for assessments of the first and fifth proximal phalanges.
Information from an AI model improves both the accuracy and the consistency of bone age assessments for physicians of all levels of experience. The first and fifth proximal phalanges are difficult to assess, and they should be paid more attention.
使用标准方法进行骨龄评估(BAA)的准确性和一致性可能会因医生的经验水平而异。
为了评估来自人工智能(AI)深度学习卷积神经网络(CNN)模型的信息对骨龄评估的影响,不同经验水平(初级、中级和高级)的专家对316名4至18岁儿童的X光片进行了评估,这些X光片被随机分为两组——A组和B组。每位专家在无额外信息(A组)和有模型信息(B组)的情况下独立评估骨龄(BA)。以四位专家的平均评估作为参考标准,计算平均绝对误差(MAE)和组内相关系数(ICC)来评估准确性和一致性。A组和B组之间还采用秩和检验比较了13块骨骼(桡骨、尺骨和短骨)的个体评估情况。
在人工智能辅助下(MAE分别为0.325、0.344和0.370),高级、中级和初级医生的准确性均显著提高(均P<0.001),优于无人工智能辅助时(MAE分别为0.403、0.469和0.755)。此外,对于高级、中级和初级医生,在人工智能辅助下(ICC分别为0.996、0.996和0.992)的一致性显著更高(均P<0.001),优于无人工智能辅助时(ICC分别为0.987、0.989和0.941)。对于所有经验水平,在人工智能辅助下对第一和第五近端指骨的评估准确性显著优于无人工智能辅助时。
人工智能模型提供的信息提高了所有经验水平医生骨龄评估的准确性和一致性。第一和第五近端指骨难以评估,应给予更多关注。