人工智能聊天机器人能否准确提供有关正畸风险的信息？

Can AI chatbots accurately provide information on orthodontic risks?

作者信息

Fan Zeng, Lei Jie, Shi Wanwei, Lin Yao, Wang Qing, Bao Lina

机构信息

Orthodontic Resident, Department of Orthodontics, Stomatological Hospital, School of Stomatology, Southern Medical University, Guangzhou, China.

Orthodontic Resident, Department of Orthodontics, Changsha Stomatological Hospital, Changsha, Hunan Province, China.

出版信息

Angle Orthod. 2025 Jun 20;95(5):483-489. doi: 10.2319/121424-1021.1. eCollection 2025 Sep.

DOI:10.2319/121424-1021.1

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12422377/

Abstract

OBJECTIVES

To evaluate and compare the validity and reliability of different artificial intelligence (AI) chatbots in answering queries about potential orthodontic risks.

MATERIALS AND METHODS

Answers to 20 frequently asked questions about the potential risks of orthodontics were derived from daily consultations with experienced orthodontists and AI chatbots (ChatGPT 4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro). The questions were repeated three times and submitted to the AI chatbots to assess the reliability of their answers. The answers from AI chatbots were scored using a modified Global Quality Scale (GQS). Low- and high-threshold validity tests were used to determine validity, and Cronbach's alpha was used to evaluate the consistency of the three responses to each of the 20 questions.

RESULTS

In the low-threshold validity test, Gemini exhibited the highest overall performance. In the high-threshold validity test, Gemini also showed the highest overall effectiveness, but there was no significant difference observed among the three chatbots. All three chatbots demonstrated satisfactory levels of reliability, with Gemini having the highest consistency.

CONCLUSIONS

AI chatbots have some potential in providing orthodontic risk information, but they must be used cautiously and further optimized to improve their effectiveness in clinical practice.

摘要

目的

评估和比较不同人工智能（AI）聊天机器人在回答有关潜在正畸风险问题时的有效性和可靠性。

材料与方法

关于正畸潜在风险的20个常见问题的答案来自与经验丰富的正畸医生以及AI聊天机器人（ChatGPT 4o、Claude 3.5 Sonnet和Gemini 1.5 Pro）的日常咨询。这些问题重复三次后提交给AI聊天机器人，以评估其答案的可靠性。使用改良的全球质量量表（GQS）对AI聊天机器人的答案进行评分。采用低阈值和高阈值有效性测试来确定有效性，并使用Cronbach's alpha评估对20个问题中每个问题的三次回答的一致性。

结果

在低阈值有效性测试中，Gemini表现出最高的整体性能。在高阈值有效性测试中，Gemini也显示出最高的整体有效性，但在这三个聊天机器人之间未观察到显著差异。所有三个聊天机器人都表现出令人满意的可靠性水平，其中Gemini的一致性最高。

结论

AI聊天机器人在提供正畸风险信息方面具有一定潜力，但必须谨慎使用并进一步优化，以提高其在临床实践中的有效性。

相似文献

1

Can AI chatbots accurately provide information on orthodontic risks?人工智能聊天机器人能否准确提供有关正畸风险的信息？

Angle Orthod. 2025 Jun 20;95(5):483-489. doi: 10.2319/121424-1021.1. eCollection 2025 Sep.

2

Information from digital and human sources: A comparison of chatbot and clinician responses to orthodontic questions.来自数字和人工来源的信息：聊天机器人与临床医生对正畸问题回答的比较。

Am J Orthod Dentofacial Orthop. 2025 May 6. doi: 10.1016/j.ajodo.2025.04.008.

3

Accuracy and Reliability of Artificial Intelligence Chatbots as Public Information Sources in Implant Dentistry.人工智能聊天机器人作为种植牙科公共信息来源的准确性和可靠性

Int J Oral Maxillofac Implants. 2025 Jun 25;0(0):1-23. doi: 10.11607/jomi.11280.

4

Evaluating the Performance of State-of-the-Art Artificial Intelligence Chatbots Based on the WHO Global Guidelines for the Prevention of Surgical Site Infection: Cross-Sectional Study.基于世界卫生组织预防手术部位感染全球指南评估最先进的人工智能聊天机器人的性能：横断面研究

J Med Internet Res. 2025 Jul 31;27:e75567. doi: 10.2196/75567.

5

Five advanced chatbots solving European Diploma in Radiology (EDiR) text-based questions: differences in performance and consistency.五个解决欧洲放射学文凭（EDiR）基于文本问题的先进聊天机器人：性能和一致性的差异。

Eur Radiol Exp. 2025 Aug 19;9(1):79. doi: 10.1186/s41747-025-00591-0.

6

Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients' frequently asked questions in prosthodontics.评估人工智能聊天机器人在回答患者口腔修复学常见问题时的有效性和一致性。

J Prosthet Dent. 2025 Apr 7. doi: 10.1016/j.prosdent.2025.03.009.

7

Reliability of Large Language Model-Based Chatbots Versus Clinicians as Sources of Information on Orthodontics: A Comparative Analysis.基于大语言模型的聊天机器人与临床医生作为正畸学信息来源的可靠性：一项比较分析。

Dent J (Basel). 2025 Jul 24;13(8):343. doi: 10.3390/dj13080343.

8

Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?儿科急诊中的人工智能聊天机器人：可靠的生命线还是风险？

Cureus. 2025 Aug 1;17(8):e89234. doi: 10.7759/cureus.89234. eCollection 2025 Aug.

9

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

10

Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study.ChatGPT-3.5、ChatGPT-4o、Copilot、Gemini、Claude和Perplexity在依据临床实践指南对腰骶神经根性疼痛提供建议方面的准确性：横断面研究

Front Digit Health. 2025 Jun 27;7:1574287. doi: 10.3389/fdgth.2025.1574287. eCollection 2025.

本文引用的文献

1

Innovation and application of Large Language Models (LLMs) in dentistry - a scoping review.大型语言模型在牙科领域的创新与应用——一项范围综述

BDJ Open. 2024 Dec 1;10(1):90. doi: 10.1038/s41405-024-00277-6.

2

ChatGPT for parents' education about early childhood caries: A friend or foe?用于家长早期儿童龋病教育的ChatGPT：是朋友还是敌人？

Int J Paediatr Dent. 2025 Jul;35(4):717-724. doi: 10.1111/ipd.13283. Epub 2024 Nov 12.

3

Challenging ChatGPT-4V for the Diagnosis of Oral Diseases and Conditions.利用ChatGPT-4V诊断口腔疾病与状况面临的挑战。

Oral Dis. 2025 Feb;31(2):701-706. doi: 10.1111/odi.15169. Epub 2024 Oct 25.

4

How reliable is the artificial intelligence product large language model ChatGPT in orthodontics?人工智能产品大语言模型 ChatGPT 在正畸领域有多可靠？

Angle Orthod. 2024 Nov 1;94(6):602-607. doi: 10.2319/031224-207.1.

5

Can artificial intelligence models serve as patient information consultants in orthodontics?人工智能模型能否在正畸学中充当患者信息顾问？

BMC Med Inform Decis Mak. 2024 Jul 29;24(1):211. doi: 10.1186/s12911-024-02619-8.

6

Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework.基于检索增强生成框架的大语言模型对肝病临床指南解读的优化

NPJ Digit Med. 2024 Apr 23;7(1):102. doi: 10.1038/s41746-024-01091-y.

7

The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard.人工智能模型在生成正畸常见问题回答方面的表现：ChatGPT与谷歌巴德的对比

Am J Orthod Dentofacial Orthop. 2024 Jun;165(6):652-662. doi: 10.1016/j.ajodo.2024.01.012. Epub 2024 Mar 15.

8

Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations.加强正畸学中的系统评价：对GPT-3.5和GPT-4使用定制提示和配置生成基于PICO的查询的比较研究

Eur J Orthod. 2024 Apr 1;46(2). doi: 10.1093/ejo/cjae011.

9

Examination of the reliability and readability of Chatbot Generative Pretrained Transformer's (ChatGPT) responses to questions about orthodontics and the evolution of these responses in an updated version.检查聊天机器人生成式预训练转换器（ChatGPT）对正畸问题的回答的可靠性和可读性，以及在更新版本中这些回答的演变。

Am J Orthod Dentofacial Orthop. 2024 May;165(5):546-555. doi: 10.1016/j.ajodo.2023.11.012. Epub 2024 Feb 1.

10

Beyond the Scalpel: Assessing ChatGPT's potential as an auxiliary intelligent virtual assistant in oral surgery.超越手术刀：评估ChatGPT作为口腔外科辅助智能虚拟助手的潜力。

Comput Struct Biotechnol J. 2023 Dec 6;24:46-52. doi: 10.1016/j.csbj.2023.11.058. eCollection 2024 Dec.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验