Department of Pediatric Dentistry, University of Alabama at Birmingham, Birmingham, AL, USA.
Department of Pediatric Dentistry, University of Alabama at Birmingham, Birmingham, AL, USA.
J Dent. 2024 May;144:104938. doi: 10.1016/j.jdent.2024.104938. Epub 2024 Apr 3.
Artificial Intelligence has applications such as Large Language Models (LLMs), which simulate human-like conversations. The potential of LLMs in healthcare is not fully evaluated. This pilot study assessed the accuracy and consistency of chatbots and clinicians in answering common questions in pediatric dentistry.
Two expert pediatric dentists developed thirty true or false questions involving different aspects of pediatric dentistry. Publicly accessible chatbots (Google Bard, ChatGPT4, ChatGPT 3.5, Llama, Sage, Claude 2 100k, Claude-instant, Claude-instant-100k, and Google Palm) were employed to answer the questions (3 independent new conversations). Three groups of clinicians (general dentists, pediatric specialists, and students; n = 20/group) also answered. Responses were graded by two pediatric dentistry faculty members, along with a third independent pediatric dentist. Resulting accuracies (percentage of correct responses) were compared using analysis of variance (ANOVA), and post-hoc pairwise group comparisons were corrected using Tukey's HSD method. ACronbach's alpha was calculated to determine consistency.
Pediatric dentists were significantly more accurate (mean±SD 96.67 %± 4.3 %) than other clinicians and chatbots (p < 0.001). General dentists (88.0 % ± 6.1 %) also demonstrated significantly higher accuracy than chatbots (p < 0.001), followed by students (80.8 %±6.9 %). ChatGPT showed the highest accuracy (78 %±3 %) among chatbots. All chatbots except ChatGPT3.5 showed acceptable consistency (Cronbach alpha>0.7).
Based on this pilot study, chatbots may be valuable adjuncts for educational purposes and for distributing information to patients. However, they are not yet ready to serve as substitutes for human clinicians in diagnostic decision-making.
In this pilot study, chatbots showed lower accuracy than dentists. Chatbots may not yet be recommended for clinical pediatric dentistry.
人工智能具有大语言模型(LLM)等应用,可模拟人类对话。LLM 在医疗保健中的潜力尚未得到充分评估。本初步研究评估了聊天机器人和临床医生回答儿科牙科常见问题的准确性和一致性。
两位儿科专家牙医制定了涉及儿科牙科不同方面的三十个真假问题。使用公共可访问的聊天机器人(Google Bard、ChatGPT4、ChatGPT 3.5、Llama、Sage、Claude 2 100k、Claude-instant、Claude-instant-100k 和 Google Palm)回答问题(3 次新的独立对话)。三组临床医生(普通牙医、儿科专家和学生;每组 n = 20)也回答了问题。由两名儿科牙科教员以及第三名独立的儿科牙医对回答进行评分。使用方差分析(ANOVA)比较产生的准确性(正确回答的百分比),并使用 Tukey 的 HSD 方法对事后两两组比较进行校正。使用 Cronbach's alpha 计算一致性。
儿科牙医的准确性明显高于其他临床医生和聊天机器人(均为 p < 0.001)(平均±SD 96.67%±4.3%)。普通牙医(88.0%±6.1%)的准确性也明显高于聊天机器人(均为 p < 0.001),其次是学生(80.8%±6.9%)。ChatGPT 在聊天机器人中表现出最高的准确性(78%±3%)。除了 ChatGPT3.5 之外,所有聊天机器人的一致性都在可接受范围内(Cronbach alpha>0.7)。
根据本初步研究,聊天机器人可能在教育和向患者分发信息方面具有价值。但是,它们还没有准备好代替临床医生进行诊断决策。
在本初步研究中,聊天机器人的准确性低于牙医。聊天机器人可能不推荐用于临床儿科牙科。