Bull David, Okaygoun Dide
Trauma and Orthopaedics, Chelsea and Westminster Hospital NHS Foundation Trust, London, GBR.
Intensive Care Unit, Barts Health NHS Trust, London, GBR.
Cureus. 2024 Nov 4;16(11):e73003. doi: 10.7759/cureus.73003. eCollection 2024 Nov.
Objective With the rapid advancement of artificial intelligence (AI) technologies, models like Chat Generative Pre-Trained Transformer (ChatGPT) are increasingly being evaluated for their potential applications in healthcare. The Prescribing Safety Assessment (PSA) is a standardised test for junior physicians in the UK to evaluate prescribing competence. This study aims to assess ChatGPT's ability to pass the PSA and its performance across different exam sections. Methodology ChatGPT (version GPT-4) was tested on four official PSA practice papers, each containing 30 questions, in three independent trials per paper, with answers evaluated using official PSA mark schemes. Performance was measured by calculating overall percentage scores and comparing them to the pass marks provided for each practice paper. Subsection performance was also analysed to identify strengths and weaknesses. Results ChatGPT achieved mean scores of 257/300 (85.67%), 236/300 (78.67%), 199/300 (66.33%), and 233/300 (77.67%) across the four papers, consistently surpassing the pass marks where available. ChatGPT performed well in sections requiring factual recall, such as "Adverse Drug Reactions", scoring 63/72 (87.50%), and "Communicating Information", scoring 63/72 (88.89%). However, it struggled in "Data Interpretation", scoring 32/72 (44.44%), showing variability across trials and indicating limitations in handling more complex clinical reasoning tasks. Conclusion While ChatGPT demonstrated strong potential in passing the PSA and excelling in sections requiring factual knowledge, its limitations in data interpretation highlight the current gaps in AI's ability to fully replicate human clinical judgement. ChatGPT shows promise in supporting safe prescribing, particularly in areas prone to human error, such as drug interactions and communicating correct information. However, due to its variability in more complex reasoning tasks, ChatGPT is not yet ready to replace human prescribers and should instead serve as a supplemental tool in clinical practice.
目的 随着人工智能(AI)技术的迅速发展,诸如聊天生成预训练变换器(ChatGPT)之类的模型在医疗保健领域的潜在应用正日益受到评估。处方安全评估(PSA)是英国初级医生评估处方能力的标准化测试。本研究旨在评估ChatGPT通过PSA的能力及其在不同考试部分的表现。方法 对ChatGPT(GPT-4版本)进行了四项官方PSA练习题测试,每份练习题包含30道题,每份试卷进行三次独立测试,答案使用官方PSA评分方案进行评估。通过计算总体百分比分数并将其与每份练习题提供的及格分数进行比较来衡量表现。还分析了各小节的表现以确定优势和劣势。结果 ChatGPT在四份试卷中的平均得分分别为257/300(85.67%)、236/300(78.67%)、199/300(66.33%)和233/300(77.67%),始终超过了可用的及格分数。ChatGPT在需要事实性回忆的部分表现出色,例如“药物不良反应”部分得分为63/72(87.50%),“信息沟通”部分得分为63/72(88.89%)。然而,它在“数据解读”部分表现不佳,得分为32/72(44.44%),各次测试结果存在差异,表明在处理更复杂的临床推理任务方面存在局限性。结论 虽然ChatGPT在通过PSA以及在需要事实性知识的部分表现出色方面显示出强大潜力,但其在数据解读方面的局限性凸显了当前人工智能在完全复制人类临床判断能力方面的差距。ChatGPT在支持安全处方方面显示出前景,特别是在容易出现人为错误的领域,如药物相互作用和传达正确信息。然而,由于其在更复杂推理任务中的变异性,ChatGPT尚未准备好取代人类开处方者,而应作为临床实践中的辅助工具。