评估人工智能聊天机器人作为牙科创伤公共信息来源的有效性和可靠性。

Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma.

作者信息

Johnson Ashish J, Singh Tarun Kumar, Gupta Aakash, Sankar Hariram, Gill Ikroop, Shalini Madhav, Mohan Neeraj

机构信息

All India Institute of Medical Sciences (AIIMS), Bathinda, India.

Maulana Azad Institute of Dental Science, New Delhi, India.

出版信息

Dent Traumatol. 2025 Apr;41(2):187-193. doi: 10.1111/edt.13000. Epub 2024 Oct 17.

DOI:10.1111/edt.13000

PMID:39417352

Abstract

AIM

This study aimed to assess the validity and reliability of AI chatbots, including Bing, ChatGPT 3.5, Google Gemini, and Claude AI, in addressing frequently asked questions (FAQs) related to dental trauma.

METHODOLOGY

A set of 30 FAQs was initially formulated by collecting responses from four AI chatbots. A panel comprising expert endodontists and maxillofacial surgeons then refined these to a final selection of 20 questions. Each question was entered into each chatbot three times, generating a total of 240 responses. These responses were evaluated using the Global Quality Score (GQS) on a 5-point Likert scale (5: strongly agree; 4: agree; 3: neutral; 2: disagree; 1: strongly disagree). Any disagreements in scoring were resolved through evidence-based discussions. The validity of the responses was determined by categorizing them as valid or invalid based on two thresholds: a low threshold (scores of ≥ 4 for all three responses) and a high threshold (scores of 5 for all three responses). A chi-squared test was used to compare the validity of the responses between the chatbots. Cronbach's alpha was calculated to assess the reliability by evaluating the consistency of repeated responses from each chatbot.

CONCLUSION

The results indicate that the Claude AI chatbot demonstrated superior validity and reliability compared to ChatGPT and Google Gemini, whereas Bing was found to be less reliable. These findings underscore the need for authorities to establish strict guidelines to ensure the accuracy of medical information provided by AI chatbots.

摘要

目的

本研究旨在评估包括必应（Bing）、ChatGPT 3.5、谷歌Gemini和Claude AI在内的人工智能聊天机器人在回答与牙外伤相关的常见问题（FAQ）方面的有效性和可靠性。

方法

最初通过收集四个人工智能聊天机器人的回答制定了一组30个常见问题。然后，由牙髓病专家和颌面外科医生组成的小组将这些问题提炼为最终选定的20个问题。每个问题在每个聊天机器人中输入三次，共产生240个回答。使用全球质量评分（GQS）在5点李克特量表（5：强烈同意；4：同意；3：中立；2：不同意；1：强烈不同意）上对这些回答进行评估。评分中的任何分歧都通过基于证据的讨论来解决。根据两个阈值将回答分类为有效或无效来确定回答的有效性：低阈值（所有三个回答的分数≥4）和高阈值（所有三个回答的分数为5）。使用卡方检验比较聊天机器人之间回答的有效性。通过评估每个聊天机器人重复回答的一致性来计算Cronbach's alpha以评估可靠性。

结论

结果表明，与ChatGPT和谷歌Gemini相比，Claude AI聊天机器人表现出更高的有效性和可靠性，而必应被发现可靠性较低。这些发现强调当局需要制定严格的指导方针，以确保人工智能聊天机器人提供的医疗信息的准确性。

相似文献

Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma.评估人工智能聊天机器人作为牙科创伤公共信息来源的有效性和可靠性。

Dent Traumatol. 2025 Apr;41(2):187-193. doi: 10.1111/edt.13000. Epub 2024 Oct 17.

Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics.人工智能聊天机器人作为牙髓学公共信息源的有效性和可靠性。

Int Endod J. 2024 Mar;57(3):305-314. doi: 10.1111/iej.14014. Epub 2023 Dec 20.

Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.人工智能聊天机器人在回应与创伤性牙损伤相关的患者咨询中的表现：一项比较研究。

Dent Traumatol. 2025 Jun;41(3):338-347. doi: 10.1111/edt.13020. Epub 2024 Nov 22.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study.前瞻性评估 4 种大型语言模型聊天机器人对患者关于急救护理问题的回答的准确性：实验性对比研究。

J Med Internet Res. 2024 Nov 4;26:e60291. doi: 10.2196/60291.

Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients' frequently asked questions in prosthodontics.评估人工智能聊天机器人在回答患者口腔修复学常见问题时的有效性和一致性。

J Prosthet Dent. 2025 Apr 7. doi: 10.1016/j.prosdent.2025.03.009.

Evaluating the Efficacy of Artificial Intelligence-Driven Chatbots in Addressing Queries on Vernal Conjunctivitis.评估人工智能驱动的聊天机器人在解答春季结膜炎相关问题方面的效果。

Cureus. 2025 Feb 26;17(2):e79688. doi: 10.7759/cureus.79688. eCollection 2025 Feb.

Evaluating AI-based breastfeeding chatbots: quality, readability, and reliability analysis.评估基于人工智能的母乳喂养聊天机器人：质量、可读性和可靠性分析。

PLoS One. 2025 Mar 17;20(3):e0319782. doi: 10.1371/journal.pone.0319782. eCollection 2025.

Information from digital and human sources: A comparison of chatbot and clinician responses to orthodontic questions.来自数字和人工来源的信息：聊天机器人与临床医生对正畸问题回答的比较。

Am J Orthod Dentofacial Orthop. 2025 May 6. doi: 10.1016/j.ajodo.2025.04.008.

Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.人工智能聊天机器人对改编自患者手册的青光眼问题的回答情况。

Cureus. 2024 Mar 23;16(3):e56766. doi: 10.7759/cureus.56766. eCollection 2024 Mar.

引用本文的文献

Artificial intelligence in pediatric dental trauma: do artificial intelligence chatbots address parental concerns effectively?儿科牙科创伤中的人工智能：人工智能聊天机器人能否有效解决家长的担忧？

BMC Oral Health. 2025 May 17;25(1):736. doi: 10.1186/s12903-025-06105-z.

Monitoring oral health remotely: ethical considerations when using AI among vulnerable populations.远程监测口腔健康：弱势群体使用人工智能时的伦理考量。

Front Oral Health. 2025 Apr 14;6:1587630. doi: 10.3389/froh.2025.1587630. eCollection 2025.

Evaluation of Chatbots in the Emergency Management of Avulsion Injuries.聊天机器人在撕脱伤应急管理中的评估

Dent Traumatol. 2025 Aug;41(4):437-444. doi: 10.1111/edt.13041. Epub 2025 Jan 24.

Enhancing, Targeting, and Improving Dental Trauma Education: Engaging Generations Y and Z.加强、靶向和改进牙外伤教育：吸引Y世代和Z世代。

Dent Traumatol. 2025 Feb;41 Suppl 1(Suppl 1):90-96. doi: 10.1111/edt.13022. Epub 2024 Dec 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估人工智能聊天机器人作为牙科创伤公共信息来源的有效性和可靠性。

Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma.

作者信息

机构信息

出版信息

AIM

METHODOLOGY

CONCLUSION

目的

方法

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献