Suppr超能文献

ChatGPT的表现能超过神经外科实习生吗?一项前瞻性对比研究。

Can ChatGPT outperform a neurosurgical trainee? A prospective comparative study.

作者信息

Williams Simon C, Starup-Hansen Joachim, Funnell Jonathan P, Hanrahan John Gerrard, Valetopoulou Alexandra, Singh Navneet, Sinha Saurabh, Muirhead William R, Marcus Hani J

机构信息

Department of Neurosurgery, St George's University Hospital, London, UK.

Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK.

出版信息

Br J Neurosurg. 2024 Feb 2:1-10. doi: 10.1080/02688697.2024.2308222.

Abstract

PURPOSE

This study aimed to compare the performance of ChatGPT, a large language model (LLM), with human neurosurgical applicants in a neurosurgical national selection interview, to assess the potential of artificial intelligence (AI) and LLMs in healthcare and provide insights into their integration into the field.

METHODS

In a prospective comparative study, a set of neurosurgical national selection-style interview questions were asked to eight human participants and ChatGPT in an online interview. All participants were doctors currently practicing in the UK who had applied for a neurosurgical National Training Number. Interviews were recorded, anonymised, and scored by three neurosurgical consultants with experience as interviewers for national selection. Answers provided by ChatGPT were used as a template for a virtual interview. Interview transcripts were subsequently scored by neurosurgical consultants using criteria utilised in real national selection interviews. Overall interview score and subdomain scores were compared between human participants and ChatGPT.

RESULTS

For overall score, ChatGPT fell behind six human competitors and did not achieve a mean score higher than any individuals who achieved training positions. Several factors, including factual inaccuracies and deviations from expected structure and style may have contributed to ChatGPT's underperformance.

CONCLUSIONS

LLMs such as ChatGPT have huge potential for integration in healthcare. However, this study emphasises the need for further development to address limitations and challenges. While LLMs have not surpassed human performance yet, collaboration between humans and AI systems holds promise for the future of healthcare.

摘要

目的

本研究旨在比较大型语言模型ChatGPT与人类神经外科申请者在神经外科国家选拔面试中的表现,评估人工智能(AI)和大型语言模型在医疗保健领域的潜力,并深入了解它们融入该领域的情况。

方法

在一项前瞻性比较研究中,通过在线面试向八名人类参与者和ChatGPT提出了一组神经外科国家选拔风格的面试问题。所有参与者均为目前在英国执业且申请了神经外科国家培训编号的医生。面试进行了录音、匿名处理,并由三位有国家选拔面试经验的神经外科顾问进行评分。ChatGPT提供的答案被用作虚拟面试的模板。随后,神经外科顾问使用实际国家选拔面试中使用的标准对面试记录进行评分。比较了人类参与者和ChatGPT的总体面试分数和子领域分数。

结果

在总体分数方面,ChatGPT落后于六名人类竞争者,其平均分数未高于任何获得培训职位的个人。包括事实不准确以及偏离预期结构和风格等几个因素可能导致了ChatGPT的表现不佳。

结论

ChatGPT等大型语言模型在医疗保健领域具有巨大的整合潜力。然而,本研究强调需要进一步发展以解决局限性和挑战。虽然大型语言模型尚未超越人类表现,但人类与人工智能系统之间的合作对医疗保健的未来充满希望。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5e9/12090375/5167c115d88e/IBJN_A_2308222_F0001_C.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验