人工智能聊天机器人作为种植牙科公共信息来源的准确性和可靠性

PURPOSE

The purpose of this study was to evaluate the accuracy, completeness, comprehensibility and reliability of widely available AI chatbots in addressing clinically significant queries pertaining to implant dentistry.

MATERIALS AND METHODS

Twenty questions were devised based on those that were most frequently asked or encountered during patient consultations by three experienced prosthodontists. That questions were asked to ChatGPT- 3.5, Gemini, Copilot AI chatbots. All questions were asked to the each chatbot three times with a twelve days intervals and a three-point Likert scale (Grade 0: incorrect, grade 1: incomplete or partially correct, and grade 2: correct) and a two point scale (true and false) were employed by the authors to grade the accuracy of the responses independently. Also completeness and comprehensibility were evaluated using a three-point Likert scale. Frequently asked five questions to each chatbot were analyzed. The comparison of total scores of the chatbots was made with one-way analysis of variance. Two point scale data were analysed by Chi-Square test. The reliability of the responses for each chatbot was analyzed by assessing the consistency of repeated responses by calculating Cronbach's alpha coefficients.

RESULTS

When the total scores of the chatbots were analyzed (ChatGPT-3.5 = 28.78 ± 4.06, Gemini = 30.89 ± 4.08, Copilot = 29.11 ± 3.22), one-way ANOVA revealed no statistically significant differences (P=.461). Evaluation of two-point scale data which analysed by Chi-Square test, revealed no statistical difference among the chatbots (P=.336). Gemini has shown higher completeness level than ChatGPT-3.5 (P=.011). There was no statistically significant difference among AI chatbots in terms of comprehensibility. Copilot demonstrated the greatest overall consistency among the three chatbots, with a Cronbach's alpha value of 0.863. This was followed by ChatGPT-3.5 with a Cronbach's alpha value of 0.779 and Gemini with a Cronbach's alpha value of 0.636.

CONCLUSIONS

The accuracy of three chatbots was found similar. All three chatbots demonstrated an acceptable level of consistency. However, given the low accuracy rate of chatbots in answering questions, it is clear that they should not be the sole decision-maker. The clinician's opinion must be given priority.

目的

本研究旨在评估广泛使用的人工智能聊天机器人在解答与种植牙科相关的具有临床意义的问题时的准确性、完整性、可理解性和可靠性。

材料与方法

根据三位经验丰富的口腔修复医生在患者咨询过程中最常被问到或遇到的问题设计了20个问题。向ChatGPT-3.5、Gemini、Copilot人工智能聊天机器人提出这些问题。所有问题每隔12天向每个聊天机器人提问三次，作者采用三点李克特量表（0级：错误，1级：不完整或部分正确，2级：正确）和两点量表（真和假）独立对回答的准确性进行评分。还使用三点李克特量表评估完整性和可理解性。对向每个聊天机器人经常问到的五个问题进行了分析。使用单因素方差分析对聊天机器人的总分进行比较。两点量表数据采用卡方检验进行分析。通过计算Cronbach's alpha系数评估每个聊天机器人回答的一致性，分析其可靠性。

结果

分析聊天机器人的总分时（ChatGPT-3.5 = 28.78 ± 4.06，Gemini = 30.89 ± 4.08，Copilot = 29.11 ± 3.22），单因素方差分析显示无统计学显著差异（P = 0.461）。通过卡方检验分析的两点量表数据评估显示，聊天机器人之间无统计学差异（P = 0.336）。Gemini的完整性水平高于ChatGPT-3.5（P = 0.011）。人工智能聊天机器人在可理解性方面无统计学显著差异。Copilot在三个聊天机器人中表现出最大的总体一致性，Cronbach's alpha值为0.863。其次是ChatGPT-3.5，Cronbach's alpha值为0.779，Gemini的Cronbach's alpha值为0.636。

结论

发现三个聊天机器人的准确性相似。所有三个聊天机器人都表现出可接受的一致性水平。然而，鉴于聊天机器人回答问题的准确率较低，显然它们不应成为唯一的决策者。必须优先考虑临床医生的意见。

Accuracy and Reliability of Artificial Intelligence Chatbots as Public Information Sources in Implant Dentistry.

作者信息

出版信息

PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSIONS

目的

材料与方法

结果

结论

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献