Stroop Anna, Stroop Tabea, Zawy Alsofy Samer, Wegner Moritz, Nakamura Makoto, Stroop Ralf
Faculty of Health, Department of Medicine, Witten-Herdecke University, Witten, Germany.
Philipps-University Marburg, Marburg, Germany.
Br J Clin Pharmacol. 2025 Aug;91(8):2294-2303. doi: 10.1002/bcp.70036. Epub 2025 Mar 11.
This study aimed to evaluate the accuracy and completeness of GPT-4, a large language model, in answering clinical pharmacological questions related to pain therapy, with a focus on its potential as a tool for delivering patient-facing medical information. The objective was to assess its reliability in delivering medical information in the context of pain management.
A cross-sectional survey-based study was conducted with healthcare professionals, including physicians and pharmacists. Participants submitted up to 8 clinical pharmacology questions on pain management, focusing on drug interactions, dosages and contraindications. GPT-4's responses were evaluated based on comprehensibility, detail, satisfaction, medical-pharmacological accuracy and completeness. Additionally, responses were compared to the German Drug Directory to assess their accuracy.
The majority of participants (99%) found GPT-4's responses comprehensible, while 84% considered the information detailed enough. Overall satisfaction was high, with 93% expressing satisfaction, and 96% deemed the responses medically accurate. However, only 63% rated the information as complete, with some identifying gaps in pharmacokinetics and drug interaction data. Usability was evaluated as good to excellent, with a System Usability Scale score of 83.38 (± 10.26).
GPT-4 demonstrates potential as a tool for delivering medical information, particularly in pain management. However, limitations such as incomplete pharmacological data and the potential for contextual carryover in follow-up questions suggest that further refinement is necessary. Developing specialized artificial intelligence tools that integrate real-time pharmacological databases could improve accuracy and reliability for clinical decision-making.
本研究旨在评估大型语言模型GPT-4在回答与疼痛治疗相关的临床药理学问题时的准确性和完整性,重点关注其作为向患者提供医疗信息工具的潜力。目的是评估其在疼痛管理背景下提供医疗信息的可靠性。
对包括医生和药剂师在内的医疗专业人员进行了一项基于横断面调查的研究。参与者提交了多达8个关于疼痛管理的临床药理学问题,重点是药物相互作用、剂量和禁忌症。根据可理解性、细节、满意度、医学药理学准确性和完整性对GPT-4的回答进行评估。此外,将回答与《德国药品目录》进行比较以评估其准确性。
大多数参与者(99%)认为GPT-4的回答易于理解,而84%的人认为信息足够详细。总体满意度较高,93%的人表示满意,96%的人认为回答在医学上准确。然而,只有63%的人将信息评为完整,一些人指出在药代动力学和药物相互作用数据方面存在差距。可用性评估为良好至优秀,系统可用性量表得分为83.38(±10.26)。
GPT-4显示出作为提供医疗信息工具的潜力,特别是在疼痛管理方面。然而,诸如药理学数据不完整以及后续问题中可能存在背景信息延续等局限性表明,有必要进一步完善。开发整合实时药理学数据库的专门人工智能工具可以提高临床决策的准确性和可靠性。