Fan Zeng, Lei Jie, Shi Wanwei, Lin Yao, Wang Qing, Bao Lina
Orthodontic Resident, Department of Orthodontics, Stomatological Hospital, School of Stomatology, Southern Medical University, Guangzhou, China.
Orthodontic Resident, Department of Orthodontics, Changsha Stomatological Hospital, Changsha, Hunan Province, China.
Angle Orthod. 2025 Jun 20;95(5):483-489. doi: 10.2319/121424-1021.1. eCollection 2025 Sep.
To evaluate and compare the validity and reliability of different artificial intelligence (AI) chatbots in answering queries about potential orthodontic risks.
Answers to 20 frequently asked questions about the potential risks of orthodontics were derived from daily consultations with experienced orthodontists and AI chatbots (ChatGPT 4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro). The questions were repeated three times and submitted to the AI chatbots to assess the reliability of their answers. The answers from AI chatbots were scored using a modified Global Quality Scale (GQS). Low- and high-threshold validity tests were used to determine validity, and Cronbach's alpha was used to evaluate the consistency of the three responses to each of the 20 questions.
In the low-threshold validity test, Gemini exhibited the highest overall performance. In the high-threshold validity test, Gemini also showed the highest overall effectiveness, but there was no significant difference observed among the three chatbots. All three chatbots demonstrated satisfactory levels of reliability, with Gemini having the highest consistency.
AI chatbots have some potential in providing orthodontic risk information, but they must be used cautiously and further optimized to improve their effectiveness in clinical practice.
评估和比较不同人工智能(AI)聊天机器人在回答有关潜在正畸风险问题时的有效性和可靠性。
关于正畸潜在风险的20个常见问题的答案来自与经验丰富的正畸医生以及AI聊天机器人(ChatGPT 4o、Claude 3.5 Sonnet和Gemini 1.5 Pro)的日常咨询。这些问题重复三次后提交给AI聊天机器人,以评估其答案的可靠性。使用改良的全球质量量表(GQS)对AI聊天机器人的答案进行评分。采用低阈值和高阈值有效性测试来确定有效性,并使用Cronbach's alpha评估对20个问题中每个问题的三次回答的一致性。
在低阈值有效性测试中,Gemini表现出最高的整体性能。在高阈值有效性测试中,Gemini也显示出最高的整体有效性,但在这三个聊天机器人之间未观察到显著差异。所有三个聊天机器人都表现出令人满意的可靠性水平,其中Gemini的一致性最高。
AI聊天机器人在提供正畸风险信息方面具有一定潜力,但必须谨慎使用并进一步优化,以提高其在临床实践中的有效性。