• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于人工智能的大语言模型在制定正畸治疗方案中的可靠性评估。

Evaluation of the Reliability of AI-Based Large Language Models in Developing Orthodontic Treatment Plans.

作者信息

Sorel Makara, Gurrala Chaitanya, Tadinada Aditya

机构信息

Oral and Maxillofacial Radiology, University of Connecticut (UConn) School of Dental Medicine, Farmington, USA.

Orthodontics and Dentofacial Orthopedics, University of Connecticut (UConn) School of Dental Medicine, Farmington, USA.

出版信息

Cureus. 2025 Jul 31;17(7):e89149. doi: 10.7759/cureus.89149. eCollection 2025 Jul.

DOI:10.7759/cureus.89149
PMID:40896057
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12398357/
Abstract

Background and aim Orthodontic treatment planning is a complex process requiring a detailed understanding of dental, skeletal, and soft tissue relationships. Traditionally, treatment decisions are made through clinical expertise and evidence-based guidelines. However, the recent evolution of AI, particularly large language models (LLMs), has warranted an evaluation of their capabilities in streamlining clinical workflows. The aim of this study was to evaluate the proficiency and effectiveness of AI-based LLMs, specifically OpenAI's ChatGPT-4o and Google's Gemini 2.0 Flash Experimental (free version), in generating orthodontic treatment plans based on real clinical cases. Materials and methods Ten published orthodontic case reports from reputed peer-reviewed journals were selected for the study and summarized into standardized clinical inputs, including patient age, occlusal relationships, skeletal and dental findings, and radiographic observations. These inputs were submitted to ChatGPT-4o and Gemini 2.0 Flash Experimental (free version) with prompts to generate extremely detailed, comprehensive treatment plans. The outputs were evaluated independently by two experienced orthodontists and one orthodontic resident using a four-point ordinal scale assessing clinical accuracy, completeness, and relevance of the treatment plan. Inter-rater reliability was assessed using Krippendorff's alpha. Results ChatGPT-4o produced treatment plans with higher clinical alignment and evaluator consensus, as indicated by Krippendorff's alpha (α = 0.935), while Gemini's plans showed greater variability and moderate agreement (α = 0.692). ChatGPT generated orthodontic treatment plans that incorporated more relevant clinical details and demonstrated stronger alignment with evidence-based standards, as assessed by the orthodontic reviewers. In contrast, Gemini generated treatment plans based on minimally accurate facts. Conclusion LLMs such as ChatGPT-4o and Gemini 2.0 Flash Experimental (free version) demonstrate potential as valuable complementary tools in orthodontic treatment planning, especially in routine cases, but do not appear to have the ability to replace clinical expertise.

摘要

背景与目的 正畸治疗计划是一个复杂的过程,需要对牙齿、骨骼和软组织关系有详细的了解。传统上,治疗决策是通过临床专业知识和循证指南做出的。然而,人工智能的最新发展,尤其是大语言模型(LLMs),使得有必要评估它们在简化临床工作流程方面的能力。本研究的目的是评估基于人工智能的大语言模型,特别是OpenAI的ChatGPT-4o和谷歌的Gemini 2.0 Flash Experimental(免费版),在基于真实临床病例生成正畸治疗计划方面的熟练度和有效性。

材料与方法 从著名的同行评审期刊中选取了10篇已发表的正畸病例报告用于本研究,并将其总结为标准化的临床输入信息,包括患者年龄、咬合关系、骨骼和牙齿检查结果以及影像学观察。这些输入信息被提交给ChatGPT-4o和Gemini 2.0 Flash Experimental(免费版),并给出提示以生成极其详细、全面的治疗计划。由两名经验丰富的正畸医生和一名正畸住院医师使用四点序数量表独立评估输出结果,该量表用于评估治疗计划的临床准确性、完整性和相关性。使用Krippendorff's alpha评估评分者间的可靠性。

结果 如Krippendorff's alpha(α = 0.935)所示,ChatGPT-4o生成的治疗计划具有更高的临床一致性和评估者共识,而Gemini生成的计划显示出更大的变异性和中等程度的一致性(α = 0.692)。正畸评审人员评估发现,ChatGPT生成的正畸治疗计划纳入了更多相关临床细节,并与循证标准表现出更强的一致性。相比之下,Gemini生成的治疗计划基于最少的准确事实。

结论 ChatGPT-4o和Gemini 2.0 Flash Experimental(免费版)等大语言模型在正畸治疗计划中显示出作为有价值的辅助工具的潜力,尤其是在常规病例中,但似乎没有能力取代临床专业知识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fa6/12398357/0dfa03ddb67b/cureus-0017-00000089149-i04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fa6/12398357/554b0f899850/cureus-0017-00000089149-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fa6/12398357/55bf971e4527/cureus-0017-00000089149-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fa6/12398357/d5850edb87d9/cureus-0017-00000089149-i03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fa6/12398357/0dfa03ddb67b/cureus-0017-00000089149-i04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fa6/12398357/554b0f899850/cureus-0017-00000089149-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fa6/12398357/55bf971e4527/cureus-0017-00000089149-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fa6/12398357/d5850edb87d9/cureus-0017-00000089149-i03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4fa6/12398357/0dfa03ddb67b/cureus-0017-00000089149-i04.jpg

相似文献

1
Evaluation of the Reliability of AI-Based Large Language Models in Developing Orthodontic Treatment Plans.基于人工智能的大语言模型在制定正畸治疗方案中的可靠性评估。
Cureus. 2025 Jul 31;17(7):e89149. doi: 10.7759/cureus.89149. eCollection 2025 Jul.
2
How Accurate Is AI? A Critical Evaluation of Commonly Used Large Language Models in Responding to Patient Concerns About Incidental Kidney Tumors.人工智能的准确性如何?对常用大语言模型回应患者对偶然发现的肾肿瘤担忧的批判性评估。
J Clin Med. 2025 Aug 12;14(16):5697. doi: 10.3390/jcm14165697.
3
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
4
A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.牙种植学中大型语言模型的多维性能评估:ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较
BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
Evaluation of the accuracy of ChatGPT-4 and Gemini's responses to the World Dental Federation's frequently asked questions on oral health.评估ChatGPT-4和Gemini对世界牙科联盟关于口腔健康常见问题的回答的准确性。
BMC Oral Health. 2025 Aug 2;25(1):1293. doi: 10.1186/s12903-025-06624-9.
7
Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?儿科急诊中的人工智能聊天机器人:可靠的生命线还是风险?
Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.
8
ChatGPT-4o outperforms gemini advanced in assisting multidisciplinary decision-making for advanced gastric cancer.ChatGPT-4o在协助晚期胃癌的多学科决策方面优于Gemini Advanced。
Eur J Surg Oncol. 2025 Apr 24;51(8):110096. doi: 10.1016/j.ejso.2025.110096.
9
Information from digital and human sources: A comparison of chatbot and clinician responses to orthodontic questions.来自数字和人工来源的信息:聊天机器人与临床医生对正畸问题回答的比较。
Am J Orthod Dentofacial Orthop. 2025 May 6. doi: 10.1016/j.ajodo.2025.04.008.
10
Comparative performance of ChatGPT, Gemini, and final-year emergency medicine clerkship students in answering multiple-choice questions: implications for the use of AI in medical education.ChatGPT、Gemini与急诊医学实习最后一年学生在回答多项选择题方面的表现比较:人工智能在医学教育中的应用启示
Int J Emerg Med. 2025 Aug 7;18(1):146. doi: 10.1186/s12245-025-00949-6.

本文引用的文献

1
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力:ChatGPT、谷歌巴德和微软必应的比较研究
Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.
2
Assessing the reliability of ChatGPT: a content analysis of self-generated and self-answered questions on clear aligners, TADs and digital imaging.评估 ChatGPT 的可靠性:对隐形矫正器、TAD 和数字成像方面的自我生成和自我回答问题的内容分析。
Dental Press J Orthod. 2023 Nov 3;28(5):e2323183. doi: 10.1590/2177-6709.28.5.e2323183.oar. eCollection 2023.
3
Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.
ChatGPT中的人工幻觉:对科学写作的影响
Cureus. 2023 Feb 19;15(2):e35179. doi: 10.7759/cureus.35179. eCollection 2023 Feb.
4
Orthodontic repositioning of a lingually positioned transmigrated mandibular canine.舌侧移位的下颌尖牙的正畸复位
Am J Orthod Dentofacial Orthop. 2023 Feb;163(2):272-284. doi: 10.1016/j.ajodo.2021.09.022. Epub 2022 Nov 18.
5
The validation of orthodontic artificial intelligence systems that perform orthodontic diagnoses and treatment planning.正畸人工智能系统进行正畸诊断和治疗计划的验证。
Eur J Orthod. 2022 Aug 16;44(4):436-444. doi: 10.1093/ejo/cjab083.
6
Artificial intelligence in orthodontics: Where are we now? A scoping review.正畸学中的人工智能:我们现在处于什么阶段?一项范围综述。
Orthod Craniofac Res. 2021 Dec;24 Suppl 2:6-15. doi: 10.1111/ocr.12517. Epub 2021 Aug 2.
7
Nonsurgical treatment of an adult with skeletal Class III malocclusion, anterior crossbite, and an impacted canine.成人骨性 III 类错颌、前牙反颌及埋伏尖牙的非手术治疗。
Am J Orthod Dentofacial Orthop. 2021 Apr;159(4):522-535. doi: 10.1016/j.ajodo.2020.01.023. Epub 2021 Jan 21.
8
Treatment of maxillary canine transposition.上颌尖牙异位的治疗。
Angle Orthod. 2020 Nov 1;90(6):873-880. doi: 10.2319/121719-808.1.
9
Orthodontic management of a complete and an incomplete maxillary canine-first premolar transposition.上颌完全性和不完全性尖牙-第一前磨牙易位的正畸矫治
Angle Orthod. 2020 May 1;90(3):457-466. doi: 10.2319/080218-561.1.
10
A novel method for the treatment of Class II malocclusion.一种治疗安氏Ⅱ类错(牙合)畸形的新方法。
Am J Orthod Dentofacial Orthop. 2020 Oct;158(4):599-611. doi: 10.1016/j.ajodo.2019.05.025.