评估ChatGPT在处方安全评估中的表现：对人工智能辅助处方的启示

Evaluating the Performance of ChatGPT in the Prescribing Safety Assessment: Implications for Artificial Intelligence-Assisted Prescribing.

作者信息

Bull David, Okaygoun Dide

机构信息

Trauma and Orthopaedics, Chelsea and Westminster Hospital NHS Foundation Trust, London, GBR.

Intensive Care Unit, Barts Health NHS Trust, London, GBR.

出版信息

Cureus. 2024 Nov 4;16(11):e73003. doi: 10.7759/cureus.73003. eCollection 2024 Nov.

DOI:10.7759/cureus.73003

PMID:39634994

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11617010/

Abstract

Objective With the rapid advancement of artificial intelligence (AI) technologies, models like Chat Generative Pre-Trained Transformer (ChatGPT) are increasingly being evaluated for their potential applications in healthcare. The Prescribing Safety Assessment (PSA) is a standardised test for junior physicians in the UK to evaluate prescribing competence. This study aims to assess ChatGPT's ability to pass the PSA and its performance across different exam sections. Methodology ChatGPT (version GPT-4) was tested on four official PSA practice papers, each containing 30 questions, in three independent trials per paper, with answers evaluated using official PSA mark schemes. Performance was measured by calculating overall percentage scores and comparing them to the pass marks provided for each practice paper. Subsection performance was also analysed to identify strengths and weaknesses. Results ChatGPT achieved mean scores of 257/300 (85.67%), 236/300 (78.67%), 199/300 (66.33%), and 233/300 (77.67%) across the four papers, consistently surpassing the pass marks where available. ChatGPT performed well in sections requiring factual recall, such as "Adverse Drug Reactions", scoring 63/72 (87.50%), and "Communicating Information", scoring 63/72 (88.89%). However, it struggled in "Data Interpretation", scoring 32/72 (44.44%), showing variability across trials and indicating limitations in handling more complex clinical reasoning tasks. Conclusion While ChatGPT demonstrated strong potential in passing the PSA and excelling in sections requiring factual knowledge, its limitations in data interpretation highlight the current gaps in AI's ability to fully replicate human clinical judgement. ChatGPT shows promise in supporting safe prescribing, particularly in areas prone to human error, such as drug interactions and communicating correct information. However, due to its variability in more complex reasoning tasks, ChatGPT is not yet ready to replace human prescribers and should instead serve as a supplemental tool in clinical practice.

摘要

目的随着人工智能（AI）技术的迅速发展，诸如聊天生成预训练变换器（ChatGPT）之类的模型在医疗保健领域的潜在应用正日益受到评估。处方安全评估（PSA）是英国初级医生评估处方能力的标准化测试。本研究旨在评估ChatGPT通过PSA的能力及其在不同考试部分的表现。方法对ChatGPT（GPT-4版本）进行了四项官方PSA练习题测试，每份练习题包含30道题，每份试卷进行三次独立测试，答案使用官方PSA评分方案进行评估。通过计算总体百分比分数并将其与每份练习题提供的及格分数进行比较来衡量表现。还分析了各小节的表现以确定优势和劣势。结果 ChatGPT在四份试卷中的平均得分分别为257/300（85.67%）、236/300（78.67%）、199/300（66.33%）和233/300（77.67%），始终超过了可用的及格分数。ChatGPT在需要事实性回忆的部分表现出色，例如“药物不良反应”部分得分为63/72（87.50%），“信息沟通”部分得分为63/72（88.89%）。然而，它在“数据解读”部分表现不佳，得分为32/72（44.44%），各次测试结果存在差异，表明在处理更复杂的临床推理任务方面存在局限性。结论虽然ChatGPT在通过PSA以及在需要事实性知识的部分表现出色方面显示出强大潜力，但其在数据解读方面的局限性凸显了当前人工智能在完全复制人类临床判断能力方面的差距。ChatGPT在支持安全处方方面显示出前景,特别是在容易出现人为错误的领域，如药物相互作用和传达正确信息。然而，由于其在更复杂推理任务中的变异性，ChatGPT尚未准备好取代人类开处方者，而应作为临床实践中的辅助工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d18/11617010/430136197a1e/cureus-0016-00000073003-i01.jpg

相似文献

Evaluating the Performance of ChatGPT in the Prescribing Safety Assessment: Implications for Artificial Intelligence-Assisted Prescribing.评估ChatGPT在处方安全评估中的表现：对人工智能辅助处方的启示

Cureus. 2024 Nov 4;16(11):e73003. doi: 10.7759/cureus.73003. eCollection 2024 Nov.

Comparison of the Performance of Artificial Intelligence Versus Medical Professionals in the Polish Final Medical Examination.人工智能与医学专业人员在波兰医学期末考试中的表现比较

Cureus. 2024 Aug 2;16(8):e66011. doi: 10.7759/cureus.66011. eCollection 2024 Aug.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试（USMLE）中的表现如何？大语言模型对医学教育和知识评估的影响。

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现：调查研究。

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。

Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

Performance of ChatGPT on the Situational Judgement Test-A Professional Dilemmas-Based Examination for Doctors in the United Kingdom.ChatGPT在情景判断测试中的表现——英国针对医生的基于专业困境的考试

JMIR Med Educ. 2023 Aug 7;9:e48978. doi: 10.2196/48978.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现：观察性研究。

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.使用高级 AI 学习和分析方法评估 ChatGPT-4 在家庭医学委员会考试中的表现：观察性研究。

JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.

A comparative analysis of GPT-3.5 and GPT-4.0 on a multiple-choice ophthalmology question bank: A study on artificial intelligence developments.基于多项选择题眼科题库对GPT-3.5和GPT-4.0的比较分析：一项关于人工智能发展的研究。

Rom J Ophthalmol. 2024 Oct-Dec;68(4):367-371. doi: 10.22336/rjo.2024.67.

Optimizing ChatGPT's Interpretation and Reporting of Delirium Assessment Outcomes: Exploratory Study.优化 ChatGPT 对谵妄评估结果的解释和报告：探索性研究。

JMIR Form Res. 2024 Oct 1;8:e51383. doi: 10.2196/51383.

本文引用的文献

Generative artificial intelligence in primary care: an online survey of UK general practitioners.初级保健中的生成式人工智能：英国全科医生的在线调查。

BMJ Health Care Inform. 2024 Sep 17;31(1):e101102. doi: 10.1136/bmjhci-2024-101102.

Performance of ChatGPT-3.5 and ChatGPT-4 on the European Board of Urology (EBU) exams: a comparative analysis.ChatGPT-3.5 和 ChatGPT-4 在欧洲泌尿外科学会（EBU）考试中的表现：比较分析。

World J Urol. 2024 Jul 26;42(1):445. doi: 10.1007/s00345-024-05137-4.

Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams.ChatGPT在骨科委员会风格笔试中的表现不佳。

Cureus. 2024 Jun 18;16(6):e62643. doi: 10.7759/cureus.62643. eCollection 2024 Jun.

Performance of Google's Artificial Intelligence Chatbot "Bard" (Now "Gemini") on Ophthalmology Board Exam Practice Questions.谷歌人工智能聊天机器人“巴德”（现称“双子座”）在眼科委员会考试练习题上的表现。

Cureus. 2024 Mar 31;16(3):e57348. doi: 10.7759/cureus.57348. eCollection 2024 Mar.

Understanding the errors made by artificial intelligence algorithms in histopathology in terms of patient impact.从对患者的影响角度理解人工智能算法在组织病理学中所犯的错误。

NPJ Digit Med. 2024 Apr 10;7(1):89. doi: 10.1038/s41746-024-01093-w.

"Chatting with ChatGPT": Analyzing the factors influencing users' intention to Use the Open AI's ChatGPT using the UTAUT model.“与ChatGPT聊天”：运用UTAUT模型分析影响用户使用OpenAI的ChatGPT意愿的因素。

Heliyon. 2023 Oct 18;9(11):e20962. doi: 10.1016/j.heliyon.2023.e20962. eCollection 2023 Nov.

ChatGPT Conquers the Saudi Medical Licensing Exam: Exploring the Accuracy of Artificial Intelligence in Medical Knowledge Assessment and Implications for Modern Medical Education.ChatGPT攻克沙特医学执照考试：探索人工智能在医学知识评估中的准确性及其对现代医学教育的影响

Cureus. 2023 Sep 11;15(9):e45043. doi: 10.7759/cureus.45043. eCollection 2023 Sep.

Will ChatGPT pass the Polish specialty exam in radiology and diagnostic imaging? Insights into strengths and limitations.ChatGPT能通过波兰放射学与诊断成像专业考试吗？对其优势与局限的洞察。

Pol J Radiol. 2023 Sep 18;88:e430-e434. doi: 10.5114/pjr.2023.131215. eCollection 2023.

Does ChatGPT succeed in the European Exam in Core Cardiology?ChatGPT在欧洲核心心脏病学考试中取得成功了吗？

Eur Heart J Digit Health. 2023 Jul 16;4(5):362-363. doi: 10.1093/ehjdh/ztad040. eCollection 2023 Oct.

Improved Performance of ChatGPT-4 on the OKAP Examination: A Comparative Study with ChatGPT-3.5.ChatGPT-4在医师执照考试（OKAP）中的表现提升：与ChatGPT-3.5的对比研究

J Acad Ophthalmol (2017). 2023 Sep 11;15(2):e184-e187. doi: 10.1055/s-0043-1774399. eCollection 2023 Jul.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估ChatGPT在处方安全评估中的表现：对人工智能辅助处方的启示

Evaluating the Performance of ChatGPT in the Prescribing Safety Assessment: Implications for Artificial Intelligence-Assisted Prescribing.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献