van Nuland Merel, Lobbezoo Anne-Fleur H, van de Garde Ewoudt M W, Herbrink Maikel, van Heijl Inger, Bognàr Tim, Houwen Jeroen P A, Dekens Marloes, Wannet Demi, Egberts Toine, van der Linden Paul D
Department of Clinical Pharmacy, Tergooi Medical Center, Hilversum, the Netherlands.
Department of Pharmacy, St. Antonius Hospital, Utrecht, Nieuwegein, the Netherlands.
Explor Res Clin Soc Pharm. 2024 Jun 13;15:100464. doi: 10.1016/j.rcsop.2024.100464. eCollection 2024 Sep.
The advent of Large Language Models (LLMs) such as ChatGPT introduces opportunities within the medical field. Nonetheless, use of LLM poses a risk when healthcare practitioners and patients present clinical questions to these programs without a comprehensive understanding of its suitability for clinical contexts.
The objective of this study was to assess ChatGPT's ability to generate appropriate responses to clinical questions that hospital pharmacists could encounter during routine patient care.
Thirty questions from 10 different domains within clinical pharmacy were collected during routine care. Questions were presented to ChatGPT in a standardized format, including patients' age, sex, drug name, dose, and indication. Subsequently, relevant information regarding specific cases were provided, and the prompt was concluded with the query "what would a hospital pharmacist do?". The impact on accuracy was assessed for each domain by modifying personification to "what would you do?", presenting the question in Dutch, and regenerating the primary question. All responses were independently evaluated by two senior hospital pharmacists, focusing on the availability of an advice, accuracy and concordance.
In 77% of questions, ChatGPT provided an advice in response to the question. For these responses, accuracy and concordance were determined. Accuracy was correct and complete for 26% of responses, correct but incomplete for 22% of responses, partially correct and partially incorrect for 30% of responses and completely incorrect for 22% of responses. The reproducibility was poor, with merely 10% of responses remaining consistent upon regeneration of the primary question.
While concordance of responses was excellent, the accuracy and reproducibility were poor. With the described method, ChatGPT should not be used to address questions encountered by hospital pharmacists during their shifts. However, it is important to acknowledge the limitations of our methodology, including potential biases, which may have influenced the findings.
诸如ChatGPT之类的大语言模型(LLMs)的出现为医学领域带来了机遇。然而,当医疗从业者和患者在对这些程序的临床适用性缺乏全面了解的情况下向其提出临床问题时,使用大语言模型会带来风险。
本研究的目的是评估ChatGPT对医院药剂师在日常患者护理中可能遇到的临床问题生成适当回答的能力。
在日常护理期间收集了临床药学10个不同领域的30个问题。问题以标准化格式呈现给ChatGPT,包括患者的年龄、性别、药物名称、剂量和适应症。随后,提供了有关具体病例的相关信息,并以“医院药剂师会怎么做?”的问题结束提示。通过将拟人化修改为“你会怎么做?”、以荷兰语提出问题以及重新生成原始问题,评估了每个领域对准确性的影响。所有回答均由两名资深医院药剂师独立评估,重点关注建议的可用性、准确性和一致性。
在77%的问题中,ChatGPT针对问题提供了建议。对于这些回答,确定了准确性和一致性。26%的回答准确且完整,22%的回答正确但不完整,30%的回答部分正确部分错误,22%的回答完全错误。可重复性较差,原始问题重新生成后,只有10%的回答保持一致。
虽然回答的一致性很好,但准确性和可重复性较差。使用所述方法,ChatGPT不应被用于解决医院药剂师在轮班期间遇到的问题。然而,必须承认我们方法的局限性,包括可能影响研究结果的潜在偏差。