基于德米尔坚方法评估生成式人工智能模型在牙龄估计中的准确性。

Evaluating the accuracy of generative artificial intelligence models in dental age estimation based on the Demirjian's method.

作者信息

Abuabara Allan, do Nascimento Thais Vilalba Paniagua Machado, Trentini Seandra Maria, Costa Gonçalves Angela Mairane, Hueb de Menezes-Oliveira Maria Angélica, Madalena Isabela Ribeiro, Beisel-Memmert Svenja, Kirschneck Christian, Antunes Livia Azeredo Alves, Miranda de Araujo Cristiano, Baratto-Filho Flares, Küchler Erika Calvano

机构信息

Post-Graduation Program in Health and Environment, University from the Joinville Region - Univille, Joinville, Brazil.

School of Dentistry, Tuiuti University of Paraná - UTP, Curitiba, Brazil.

出版信息

Front Dent Med. 2025 Jul 29;6:1634006. doi: 10.3389/fdmed.2025.1634006. eCollection 2025.

DOI:10.3389/fdmed.2025.1634006

PMID:40800006

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12339434/

Abstract

INTRODUCTION

Dental age estimation plays a key role in forensic identification, clinical diagnosis, treatment planning, and prognosis in fields such as pediatric dentistry and orthodontics. Large language models (LLM) are increasingly being recognized for their potential applications in Dentistry. This study aimed to compare the performance of currently available generative artificial intelligence LLM technologies in estimating dental age using the Demirjian's scores.

METHODS

Panoramic radiographs were analyzed using Demirjian's method (1973), with each left permanent mandibular tooth classified from stage A to H. Untrained LLM, ChatGPT (GPT-4-turbo), Gemini 2.0 Flash, and DeepSeek-V3 were tasked with estimating dental age based on the patient's Demirjian score for each tooth. Due to the probabilistic nature of ChatGPT, Gemini, and DeepSeek, which can produce varying responses to the same question, three responses were collected per case per day (three different computers) from each model on three separate days. The age estimates obtained from LLM were compared to the individuals' chronological ages. Intra- and inter-examiner reliability was assessed using the Intraclass Correlation Coefficient (ICC). Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Coefficient of Determination ( ), and Bias.

RESULTS

Thirty panoramic radiographs (40% female, 60% male; mean age 10.4 ± 2.32 years) were included. Both intra- and inter-examiner ICC values exceeded 0.85. ChatGPT and DeepSeek exhibited comparable but suboptimal performance, with higher errors (MAE: 1.98-2.05 years; RMSE: 2.33-2.35 years), negative values (-0.069 to -0.049), and substantial overestimation biases (1.90-1.91 years), indicating poor model fit and systematic flaws. Gemini demonstrated intermediate results, with a moderate MAE (1.57 years) and RMSE (1.81 years), a positive (0.367), and a lower bias (1.32 years).

DISCUSSION

This study demonstrated that, although LLM like ChatGPT, Gemini, and DeepSeek can estimate dental age using Demirjian's scores, their performance remains inferior to the traditional method. Among them, DeepSeek-V3 showed the best results, but all models require task-specific training and validation before clinical application.

摘要

引言

牙齿年龄估计在法医鉴定、临床诊断、治疗计划以及儿科牙科和正畸等领域的预后评估中起着关键作用。大语言模型（LLM）在牙科领域的潜在应用越来越受到认可。本研究旨在比较目前可用的生成式人工智能LLM技术在使用德米尔坚评分法估计牙齿年龄方面的性能。

方法

使用德米尔坚方法（1973年）分析全景X线片，将每颗左侧下颌恒牙从A期到H期进行分类。未经过训练的LLM、ChatGPT（GPT - 4 - turbo）、Gemini 2.0 Flash和DeepSeek - V3被要求根据患者每颗牙齿的德米尔坚评分来估计牙齿年龄。由于ChatGPT、Gemini和DeepSeek具有概率性，对同一个问题可能会产生不同的回答，因此在三天内每天从每个模型的三个不同计算机上针对每个病例收集三个回答。将从LLM获得的年龄估计值与个体的实际年龄进行比较。使用组内相关系数（ICC）评估检查者内和检查者间的可靠性。使用平均绝对误差（MAE）、均方根误差（RMSE）、决定系数（）和偏差来评估模型性能。

结果

纳入了30张全景X线片（40%为女性，60%为男性；平均年龄10.4±2.32岁）。检查者内和检查者间的ICC值均超过0.85。ChatGPT和DeepSeek表现出相当但次优的性能，误差较高（MAE：1.98 - 2.05岁；RMSE：2.33 - 2.35岁），值为负（-0.069至-0.049），且存在明显的高估偏差（1.90 - 1.91岁），表明模型拟合不佳和存在系统缺陷。Gemini表现出中等结果，MAE为1.57岁，RMSE为1.81岁，为正（0.367），偏差较低（1.32岁）。

讨论

本研究表明，尽管像ChatGPT、Gemini和DeepSeek这样的LLM可以使用德米尔坚评分法估计牙齿年龄，但其性能仍不如传统方法。其中，DeepSeek - V3表现出最好的结果，但所有模型在临床应用前都需要进行特定任务的训练和验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4863/12339434/feb017e88fd3/fdmed-06-1634006-g001.jpg

相似文献

Evaluating the accuracy of generative artificial intelligence models in dental age estimation based on the Demirjian's method.

Front Dent Med. 2025 Jul 29;6:1634006. doi: 10.3389/fdmed.2025.1634006. eCollection 2025.

A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.

BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.

Prescription of Controlled Substances: Benefits and Risks

Dental age estimation by comparing Demirjian's method and machine learning in Southeast Brazilian youth.

Forensic Sci Med Pathol. 2025 Jul 11. doi: 10.1007/s12024-025-01042-3.

Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.

JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.

[Preliminary exploration of the applications of five large language models in the field of oral auxiliary diagnosis, treatment and health consultation].

Zhonghua Kou Qiang Yi Xue Za Zhi. 2025 Jul 30;60(8):871-878. doi: 10.3760/cma.j.cn112144-20241107-00418.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

A Comparative Study on the Use of DeepSeek-R1 and ChatGPT-4.5 in Different Aspects of Plastic Surgery.

Aesthetic Plast Surg. 2025 Aug 11. doi: 10.1007/s00266-025-05108-z.

New generative artificial intelligence model: ScholarGPT's performance on dental avulsion.

Int J Med Inform. 2025 Dec;204:106080. doi: 10.1016/j.ijmedinf.2025.106080. Epub 2025 Aug 13.

本文引用的文献

Association of PTH and vitamin D-related genes with dental development in Brazilian children: a cross-sectional study.

Braz Oral Res. 2025 Mar 31;39:e033. doi: 10.1590/1807-3107bor-2025.vol39.033. eCollection 2025.

Transforming dental diagnostics with artificial intelligence: advanced integration of ChatGPT and large language models for patient care.

Front Dent Med. 2025 Jan 6;5:1456208. doi: 10.3389/fdmed.2024.1456208. eCollection 2024.

Unlocking the Potentials of Large Language Models in Orthodontics: A Scoping Review.

Bioengineering (Basel). 2024 Nov 13;11(11):1145. doi: 10.3390/bioengineering11111145.

ChatGPT for Academic Purposes: Survey Among Undergraduate Healthcare Students in Malaysia.

Cureus. 2024 Jan 27;16(1):e53032. doi: 10.7759/cureus.53032. eCollection 2024 Jan.

Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery.

Comput Struct Biotechnol J. 2023 Dec 6;24:46-52. doi: 10.1016/j.csbj.2023.11.058. eCollection 2024 Dec.

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.

J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.

Perception, performance, and detectability of conversational artificial intelligence across 32 university courses.

Sci Rep. 2023 Aug 24;13(1):12187. doi: 10.1038/s41598-023-38964-3.

Aesthetic Surgery Advice and Counseling from Artificial Intelligence: A Rhinoplasty Consultation with ChatGPT.

Aesthetic Plast Surg. 2023 Oct;47(5):1985-1993. doi: 10.1007/s00266-023-03338-7. Epub 2023 Apr 24.

A systematic overview of dental methods for age assessment in living individuals: from traditional to artificial intelligence-based approaches.

Int J Legal Med. 2023 Jul;137(4):1117-1146. doi: 10.1007/s00414-023-02960-z. Epub 2023 Apr 14.

Preciseness of artificial intelligence for lateral cephalometric measurements.

J Orofac Orthop. 2024 May;85(Suppl 1):27-33. doi: 10.1007/s00056-023-00459-1. Epub 2023 Mar 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

基于德米尔坚方法评估生成式人工智能模型在牙龄估计中的准确性。

Evaluating the accuracy of generative artificial intelligence models in dental age estimation based on the Demirjian's method.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

基于德米尔坚方法评估生成式人工智能模型在牙龄估计中的准确性。

Evaluating the accuracy of generative artificial intelligence models in dental age estimation based on the Demirjian's method.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献