Suppr超能文献

聊天机器人与临床医生回答儿科牙科问题的准确性和一致性:一项试点研究。

Accuracy and consistency of chatbots versus clinicians for answering pediatric dentistry questions: A pilot study.

机构信息

Department of Pediatric Dentistry, University of Alabama at Birmingham, Birmingham, AL, USA.

Department of Pediatric Dentistry, University of Alabama at Birmingham, Birmingham, AL, USA.

出版信息

J Dent. 2024 May;144:104938. doi: 10.1016/j.jdent.2024.104938. Epub 2024 Apr 3.

Abstract

OBJECTIVES

Artificial Intelligence has applications such as Large Language Models (LLMs), which simulate human-like conversations. The potential of LLMs in healthcare is not fully evaluated. This pilot study assessed the accuracy and consistency of chatbots and clinicians in answering common questions in pediatric dentistry.

METHODS

Two expert pediatric dentists developed thirty true or false questions involving different aspects of pediatric dentistry. Publicly accessible chatbots (Google Bard, ChatGPT4, ChatGPT 3.5, Llama, Sage, Claude 2 100k, Claude-instant, Claude-instant-100k, and Google Palm) were employed to answer the questions (3 independent new conversations). Three groups of clinicians (general dentists, pediatric specialists, and students; n = 20/group) also answered. Responses were graded by two pediatric dentistry faculty members, along with a third independent pediatric dentist. Resulting accuracies (percentage of correct responses) were compared using analysis of variance (ANOVA), and post-hoc pairwise group comparisons were corrected using Tukey's HSD method. ACronbach's alpha was calculated to determine consistency.

RESULTS

Pediatric dentists were significantly more accurate (mean±SD 96.67 %± 4.3 %) than other clinicians and chatbots (p < 0.001). General dentists (88.0 % ± 6.1 %) also demonstrated significantly higher accuracy than chatbots (p < 0.001), followed by students (80.8 %±6.9 %). ChatGPT showed the highest accuracy (78 %±3 %) among chatbots. All chatbots except ChatGPT3.5 showed acceptable consistency (Cronbach alpha>0.7).

CLINICAL SIGNIFICANCE

Based on this pilot study, chatbots may be valuable adjuncts for educational purposes and for distributing information to patients. However, they are not yet ready to serve as substitutes for human clinicians in diagnostic decision-making.

CONCLUSION

In this pilot study, chatbots showed lower accuracy than dentists. Chatbots may not yet be recommended for clinical pediatric dentistry.

摘要

目的

人工智能具有大语言模型(LLM)等应用,可模拟人类对话。LLM 在医疗保健中的潜力尚未得到充分评估。本初步研究评估了聊天机器人和临床医生回答儿科牙科常见问题的准确性和一致性。

方法

两位儿科专家牙医制定了涉及儿科牙科不同方面的三十个真假问题。使用公共可访问的聊天机器人(Google Bard、ChatGPT4、ChatGPT 3.5、Llama、Sage、Claude 2 100k、Claude-instant、Claude-instant-100k 和 Google Palm)回答问题(3 次新的独立对话)。三组临床医生(普通牙医、儿科专家和学生;每组 n = 20)也回答了问题。由两名儿科牙科教员以及第三名独立的儿科牙医对回答进行评分。使用方差分析(ANOVA)比较产生的准确性(正确回答的百分比),并使用 Tukey 的 HSD 方法对事后两两组比较进行校正。使用 Cronbach's alpha 计算一致性。

结果

儿科牙医的准确性明显高于其他临床医生和聊天机器人(均为 p < 0.001)(平均±SD 96.67%±4.3%)。普通牙医(88.0%±6.1%)的准确性也明显高于聊天机器人(均为 p < 0.001),其次是学生(80.8%±6.9%)。ChatGPT 在聊天机器人中表现出最高的准确性(78%±3%)。除了 ChatGPT3.5 之外,所有聊天机器人的一致性都在可接受范围内(Cronbach alpha>0.7)。

临床意义

根据本初步研究,聊天机器人可能在教育和向患者分发信息方面具有价值。但是,它们还没有准备好代替临床医生进行诊断决策。

结论

在本初步研究中,聊天机器人的准确性低于牙医。聊天机器人可能不推荐用于临床儿科牙科。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验