Suppr超能文献

基于德米尔坚方法评估生成式人工智能模型在牙龄估计中的准确性。

Evaluating the accuracy of generative artificial intelligence models in dental age estimation based on the Demirjian's method.

作者信息

Abuabara Allan, do Nascimento Thais Vilalba Paniagua Machado, Trentini Seandra Maria, Costa Gonçalves Angela Mairane, Hueb de Menezes-Oliveira Maria Angélica, Madalena Isabela Ribeiro, Beisel-Memmert Svenja, Kirschneck Christian, Antunes Livia Azeredo Alves, Miranda de Araujo Cristiano, Baratto-Filho Flares, Küchler Erika Calvano

机构信息

Post-Graduation Program in Health and Environment, University from the Joinville Region - Univille, Joinville, Brazil.

School of Dentistry, Tuiuti University of Paraná - UTP, Curitiba, Brazil.

出版信息

Front Dent Med. 2025 Jul 29;6:1634006. doi: 10.3389/fdmed.2025.1634006. eCollection 2025.

Abstract

INTRODUCTION

Dental age estimation plays a key role in forensic identification, clinical diagnosis, treatment planning, and prognosis in fields such as pediatric dentistry and orthodontics. Large language models (LLM) are increasingly being recognized for their potential applications in Dentistry. This study aimed to compare the performance of currently available generative artificial intelligence LLM technologies in estimating dental age using the Demirjian's scores.

METHODS

Panoramic radiographs were analyzed using Demirjian's method (1973), with each left permanent mandibular tooth classified from stage A to H. Untrained LLM, ChatGPT (GPT-4-turbo), Gemini 2.0 Flash, and DeepSeek-V3 were tasked with estimating dental age based on the patient's Demirjian score for each tooth. Due to the probabilistic nature of ChatGPT, Gemini, and DeepSeek, which can produce varying responses to the same question, three responses were collected per case per day (three different computers) from each model on three separate days. The age estimates obtained from LLM were compared to the individuals' chronological ages. Intra- and inter-examiner reliability was assessed using the Intraclass Correlation Coefficient (ICC). Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Coefficient of Determination ( ), and Bias.

RESULTS

Thirty panoramic radiographs (40% female, 60% male; mean age 10.4 ± 2.32 years) were included. Both intra- and inter-examiner ICC values exceeded 0.85. ChatGPT and DeepSeek exhibited comparable but suboptimal performance, with higher errors (MAE: 1.98-2.05 years; RMSE: 2.33-2.35 years), negative values (-0.069 to -0.049), and substantial overestimation biases (1.90-1.91 years), indicating poor model fit and systematic flaws. Gemini demonstrated intermediate results, with a moderate MAE (1.57 years) and RMSE (1.81 years), a positive (0.367), and a lower bias (1.32 years).

DISCUSSION

This study demonstrated that, although LLM like ChatGPT, Gemini, and DeepSeek can estimate dental age using Demirjian's scores, their performance remains inferior to the traditional method. Among them, DeepSeek-V3 showed the best results, but all models require task-specific training and validation before clinical application.

摘要

引言

牙齿年龄估计在法医鉴定、临床诊断、治疗计划以及儿科牙科和正畸等领域的预后评估中起着关键作用。大语言模型(LLM)在牙科领域的潜在应用越来越受到认可。本研究旨在比较目前可用的生成式人工智能LLM技术在使用德米尔坚评分法估计牙齿年龄方面的性能。

方法

使用德米尔坚方法(1973年)分析全景X线片,将每颗左侧下颌恒牙从A期到H期进行分类。未经过训练的LLM、ChatGPT(GPT - 4 - turbo)、Gemini 2.0 Flash和DeepSeek - V3被要求根据患者每颗牙齿的德米尔坚评分来估计牙齿年龄。由于ChatGPT、Gemini和DeepSeek具有概率性,对同一个问题可能会产生不同的回答,因此在三天内每天从每个模型的三个不同计算机上针对每个病例收集三个回答。将从LLM获得的年龄估计值与个体的实际年龄进行比较。使用组内相关系数(ICC)评估检查者内和检查者间的可靠性。使用平均绝对误差(MAE)、均方根误差(RMSE)、决定系数( )和偏差来评估模型性能。

结果

纳入了30张全景X线片(40%为女性,60%为男性;平均年龄10.4±2.32岁)。检查者内和检查者间的ICC值均超过0.85。ChatGPT和DeepSeek表现出相当但次优的性能,误差较高(MAE:1.98 - 2.05岁;RMSE:2.33 - 2.35岁), 值为负(-0.069至-0.049),且存在明显的高估偏差(1.90 - 1.91岁),表明模型拟合不佳和存在系统缺陷。Gemini表现出中等结果,MAE为1.57岁,RMSE为1.81岁, 为正(0.367),偏差较低(1.32岁)。

讨论

本研究表明,尽管像ChatGPT、Gemini和DeepSeek这样的LLM可以使用德米尔坚评分法估计牙齿年龄,但其性能仍不如传统方法。其中,DeepSeek - V3表现出最好的结果,但所有模型在临床应用前都需要进行特定任务的训练和验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4863/12339434/feb017e88fd3/fdmed-06-1634006-g001.jpg

相似文献

1
Evaluating the accuracy of generative artificial intelligence models in dental age estimation based on the Demirjian's method.
Front Dent Med. 2025 Jul 29;6:1634006. doi: 10.3389/fdmed.2025.1634006. eCollection 2025.
4
Dental age estimation by comparing Demirjian's method and machine learning in Southeast Brazilian youth.
Forensic Sci Med Pathol. 2025 Jul 11. doi: 10.1007/s12024-025-01042-3.
6
[Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation].
Zhonghua Kou Qiang Yi Xue Za Zhi. 2025 Jul 30;60(8):871-878. doi: 10.3760/cma.j.cn112144-20241107-00418.
9
10
New generative artificial intelligence model: ScholarGPT's performance on dental avulsion.
Int J Med Inform. 2025 Dec;204:106080. doi: 10.1016/j.ijmedinf.2025.106080. Epub 2025 Aug 13.

本文引用的文献

1
Association of PTH and vitamin D-related genes with dental development in Brazilian children: a cross-sectional study.
Braz Oral Res. 2025 Mar 31;39:e033. doi: 10.1590/1807-3107bor-2025.vol39.033. eCollection 2025.
3
Unlocking the Potentials of Large Language Models in Orthodontics: A Scoping Review.
Bioengineering (Basel). 2024 Nov 13;11(11):1145. doi: 10.3390/bioengineering11111145.
4
ChatGPT for Academic Purposes: Survey Among Undergraduate Healthcare Students in Malaysia.
Cureus. 2024 Jan 27;16(1):e53032. doi: 10.7759/cureus.53032. eCollection 2024 Jan.
5
Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery.
Comput Struct Biotechnol J. 2023 Dec 6;24:46-52. doi: 10.1016/j.csbj.2023.11.058. eCollection 2024 Dec.
8
Aesthetic Surgery Advice and Counseling from Artificial Intelligence: A Rhinoplasty Consultation with ChatGPT.
Aesthetic Plast Surg. 2023 Oct;47(5):1985-1993. doi: 10.1007/s00266-023-03338-7. Epub 2023 Apr 24.
10
Preciseness of artificial intelligence for lateral cephalometric measurements.
J Orofac Orthop. 2024 May;85(Suppl 1):27-33. doi: 10.1007/s00056-023-00459-1. Epub 2023 Mar 9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验