泌尿外科住院医师的非技术技能：一项针对ChatGPT4人工智能与顾问互动进行基准测试的双盲研究。

Non-technical Skills for Urology Trainees: A Double-Blinded Study of ChatGPT4 AI Benchmarking Against Consultant Interaction.

作者信息

Pears Matthew, Wadhwa Karan, Payne Stephen R, Hanchanale Vishwanath, Elmamoun Mamoun Hamid, Jain Sunjay, Konstantinidis Stathis Th, Rochester Mark, Doherty Ruth, Spearpoint Kenneth, Ng Oliver, Dick Lachlan, Yule Steven, Biyani Chandra Shekhar

机构信息

School of Health Sciences, University of Nottingham, Nottingham, UK.

Department of Urology, Broomfield Hospital, Chelmsford, UK.

出版信息

J Healthc Inform Res. 2024 Nov 14;9(1):103-118. doi: 10.1007/s41666-024-00180-7. eCollection 2025 Mar.

DOI:10.1007/s41666-024-00180-7

PMID:39897101

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11782744/

Abstract

Non-technical skills (NTS) are crucial in healthcare, encompassing cognitive and social skills that support technical ability. Traditional NTS training is evolving with the emergence of artificial intelligence (AI) models that can intelligently converse with their users, known as large language models (LLMs). This study investigated the capabilities and limitations of a popular model named generative pre-trained transformer 4 (GPT-4) in NTS training, comparing its performance to that of human evaluators. Urology trainees identified NTS events in simulated scenarios and discussed them in blinded feedback sessions with AI and human consultants. Experts assessed the blinded interaction data, providing quantitative ratings and qualitative evaluations using annotated transcripts. Wilcoxon signed-rank tests compared pre- and post-intervention ratings, whilst Mann-Whitney tests compared post-intervention ratings between AI and human feedback. Thematic analysis identified strengths, limitations, and differences between AI and human feedback approaches. The AI model demonstrated significant strengths in reinforcing knowledge gathering ( = 0.04), providing accurate and evidence-based feedback ( = 0.013), conveying empathy ( = 0.021), and tailoring explanations to complexity ( = 0.002). However, human feedback excelled in language terminology ( = 0.003), complexity ( = 0.020), and fact-based feedback ( = 0.025). The study highlights the potential for AI to augment assessment of NTS training in healthcare. A blended approach utilising AI and human expertise may boost training efficacy.

摘要

非技术技能（NTS）在医疗保健领域至关重要，它涵盖了支持技术能力的认知和社交技能。随着能够与用户进行智能对话的人工智能（AI）模型（即大语言模型，LLMs）的出现，传统的NTS培训正在不断发展。本研究调查了一种名为生成式预训练变换器4（GPT-4）的流行模型在NTS培训中的能力和局限性，并将其性能与人类评估者的性能进行了比较。泌尿外科实习生在模拟场景中识别NTS事件，并在与AI和人类顾问的盲态反馈会议中进行讨论。专家们评估了盲态交互数据，使用注释转录本提供定量评分和定性评估。Wilcoxon符号秩检验比较了干预前后的评分，而Mann-Whitney检验比较了AI和人类反馈之间的干预后评分。主题分析确定了AI和人类反馈方法的优势、局限性和差异。AI模型在加强知识收集（=0.04）、提供准确且基于证据的反馈（=0.013）、表达同理心（=0.021）以及根据复杂性调整解释（=0.002）方面表现出显著优势。然而，人类反馈在语言术语（=0.003）、复杂性（=0.020）和基于事实的反馈（=0.025）方面表现更出色。该研究强调了AI在增强医疗保健领域NTS培训评估方面的潜力。采用AI和人类专业知识的混合方法可能会提高培训效果。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

泌尿外科住院医师的非技术技能：一项针对ChatGPT4人工智能与顾问互动进行基准测试的双盲研究。

Non-technical Skills for Urology Trainees: A Double-Blinded Study of ChatGPT4 AI Benchmarking Against Consultant Interaction.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

泌尿外科住院医师的非技术技能：一项针对ChatGPT4人工智能与顾问互动进行基准测试的双盲研究。

Non-technical Skills for Urology Trainees: A Double-Blinded Study of ChatGPT4 AI Benchmarking Against Consultant Interaction.

作者信息

机构信息

出版信息

相似文献

本文引用的文献