Kunz Felix, Stellzig-Eisenhauer Angelika, Widmaier Lisa Marie, Zeman Florian, Boldt Julian
Department of Orthodontics, University Hospital of Würzburg, Pleicherwall 2, 97070, Würzburg, Germany.
Centre for Clinical Studies, University Hospital of Regensburg, Regensburg, Germany.
J Orofac Orthop. 2025 May;86(3):145-160. doi: 10.1007/s00056-023-00491-1. Epub 2023 Aug 29.
The aim of this investigation was to evaluate the accuracy of various skeletal and dental cephalometric parameters as produced by different commercial providers that make use of artificial intelligence (AI)-assisted automated cephalometric analysis and to compare their quality to a gold standard established by orthodontic experts.
Twelve experienced orthodontic examiners pinpointed 15 radiographic landmarks on a total of 50 cephalometric X‑rays. The landmarks were used to generate 9 parameters for orthodontic treatment planning. The "humans' gold standard" was defined by calculating the median value of all 12 human assessments for each parameter, which in turn served as reference values for comparisons with results given by four different commercial providers of automated cephalometric analyses (DentaliQ.ortho [CellmatiQ GmbH, Hamburg, Germany], WebCeph [AssembleCircle Corp, Seongnam-si, Korea], AudaxCeph [Audax d.o.o., Ljubljana, Slovenia], CephX [Orca Dental AI, Herzliya, Israel]). Repeated measures analysis of variances (ANOVAs) were calculated and Bland-Altman plots were generated for comparisons.
The results of the repeated measures ANOVAs indicated significant differences between the commercial providers' predictions and the humans' gold standard for all nine investigated parameters. However, the pairwise comparisons also demonstrate that there were major differences among the four commercial providers. While there were no significant mean differences between the values of DentaliQ.ortho and the humans' gold standard, the predictions of AudaxCeph showed significant deviations in seven out of nine parameters. Also, the Bland-Altman plots demonstrate that a reduced precision of AI predictions must be expected especially for values attributed to the inclination of the incisors.
Fully automated cephalometric analyses are promising in terms of timesaving and avoidance of individual human errors. At present, however, they should only be used under supervision of experienced clinicians.
本研究旨在评估不同商业供应商利用人工智能(AI)辅助自动头影测量分析得出的各种骨骼和牙齿头影测量参数的准确性,并将其质量与正畸专家建立的金标准进行比较。
12名经验丰富的正畸检查人员在总共50张头影测量X光片上确定了15个放射学标志点。这些标志点用于生成9个正畸治疗计划参数。“人类金标准”是通过计算每个参数的所有12次人类评估的中位数来定义的,这些中位数又作为与四家不同的自动头影测量分析商业供应商(DentaliQ.ortho [CellmatiQ GmbH,德国汉堡]、WebCeph [AssembleCircle Corp,韩国城南市]、AudaxCeph [Audax d.o.o.,斯洛文尼亚卢布尔雅那]、CephX [Orca Dental AI,以色列赫兹利亚])给出的结果进行比较的参考值。计算了重复测量方差分析(ANOVAs)并生成了Bland-Altman图进行比较。
重复测量方差分析的结果表明,在所有九个研究参数中,商业供应商的预测与人类金标准之间存在显著差异。然而,两两比较也表明,四家商业供应商之间存在重大差异。虽然DentaliQ.ortho的值与人类金标准之间没有显著的平均差异,但AudaxCeph的预测在九个参数中的七个参数上显示出显著偏差。此外,Bland-Altman图表明,尤其是对于归因于切牙倾斜度的值,必须预期AI预测的精度会降低。
全自动头影测量分析在节省时间和避免个人人为误差方面很有前景。然而,目前它们仅应在经验丰富的临床医生的监督下使用。