Suppr超能文献

评估人工智能(AI)聊天机器人生成的回复在做出针对特定患者的药物治疗和医疗相关决策时的准确性和质量。

Assessing the accuracy and quality of artificial intelligence (AI) chatbot-generated responses in making patient-specific drug-therapy and healthcare-related decisions.

作者信息

Shiferaw Meron W, Zheng Taylor, Winter Abigail, Mike Leigh Ann, Chan Lingtak-Neander

机构信息

School of Pharmacy, University of Washington, 1959 NE Pacific Street, Box 357630, Seattle, WA, 98195, USA.

出版信息

BMC Med Inform Decis Mak. 2024 Dec 24;24(1):404. doi: 10.1186/s12911-024-02824-5.

Abstract

BACKGROUND

Interactive artificial intelligence tools such as ChatGPT have gained popularity, yet little is known about their reliability as a reference tool for healthcare-related information for healthcare providers and trainees. The objective of this study was to assess the consistency, quality, and accuracy of the responses generated by ChatGPT on healthcare-related inquiries.

METHODS

A total of 18 open-ended questions including six questions in three defined clinical areas (2 each to address "what", "why", and "how", respectively) were submitted to ChatGPT v3.5 based on real-world usage experience. The experiment was conducted in duplicate using 2 computers. Five investigators independently ranked each response using a 4-point scale to rate the quality of the bot's responses. The Delphi method was used to compare each investigator's score with the goal of reaching at least 80% consistency. The accuracy of the responses was checked using established professional references and resources. When the responses were in question, the bot was asked to provide reference material used for the investigators to determine the accuracy and quality. The investigators determined the consistency, accuracy, and quality by establishing a consensus.

RESULTS

The speech pattern and length of the responses were consistent within the same user but different between users. Occasionally, ChatGPT provided 2 completely different responses to the same question. Overall, ChatGPT provided more accurate responses (8 out of 12) to the "what" questions with less reliable performance to the "why" and "how" questions. We identified errors in calculation, unit of measurement, and misuse of protocols by ChatGPT. Some of these errors could result in clinical decisions leading to harm. We also identified citations and references shown by ChatGPT that did not exist in the literature.

CONCLUSIONS

ChatGPT is not ready to take on the coaching role for either healthcare learners or healthcare professionals. The lack of consistency in the responses to the same question is problematic for both learners and decision-makers. The intrinsic assumptions made by the chatbot could lead to erroneous clinical decisions. The unreliability in providing valid references is a serious flaw in using ChatGPT to drive clinical decision making.

摘要

背景

ChatGPT等交互式人工智能工具已广受欢迎,但对于其作为医疗保健提供者和实习生获取医疗相关信息的参考工具的可靠性,人们了解甚少。本研究的目的是评估ChatGPT对医疗相关问题的回答的一致性、质量和准确性。

方法

根据实际使用经验,向ChatGPT v3.5提交了总共18个开放式问题,包括三个特定临床领域的六个问题(每个领域分别有2个问题,分别涉及“是什么”“为什么”和“如何做”)。使用2台计算机重复进行该实验。五名研究人员使用4分制独立对每个回答进行评分,以评估机器人回答的质量。采用德尔菲法比较每位研究人员的评分,目标是达成至少80%的一致性。使用既定的专业参考文献和资源检查回答的准确性。当回答存在疑问时,要求机器人提供用于研究人员确定准确性和质量的参考材料。研究人员通过达成共识来确定一致性、准确性和质量。

结果

回答的语言模式和长度在同一用户内是一致的,但不同用户之间有所不同。偶尔,ChatGPT会对同一个问题给出两个完全不同的回答。总体而言,ChatGPT对“是什么”问题的回答更准确(12个中有8个),而对“为什么”和“如何做”问题的表现则不太可靠。我们发现ChatGPT存在计算错误、计量单位错误和协议使用不当的问题。其中一些错误可能导致临床决策失误并造成伤害。我们还发现ChatGPT显示的参考文献在文献中并不存在。

结论

ChatGPT还不适用于指导医疗学习者或医疗专业人员。对同一问题的回答缺乏一致性,这对学习者和决策者来说都是个问题。聊天机器人的内在假设可能导致错误的临床决策。在提供有效参考文献方面的不可靠性是使用ChatGPT推动临床决策时的一个严重缺陷。

相似文献

引用本文的文献

本文引用的文献

3
Comparative Evaluation of Diagnostic Accuracy Between Google Bard and Physicians.谷歌巴德与医生之间诊断准确性的比较评估
Am J Med. 2023 Nov;136(11):1119-1123.e18. doi: 10.1016/j.amjmed.2023.08.003. Epub 2023 Aug 27.
7
Performance of ChatGPT on the pharmacist licensing examination in Taiwan.ChatGPT 在台湾药剂师执照考试中的表现。
J Chin Med Assoc. 2023 Jul 1;86(7):653-658. doi: 10.1097/JCMA.0000000000000942. Epub 2023 Jul 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验