• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT能通过泌尿外科专科培训考试吗?手术培训评估中的人工智能能力。

Can ChatGPT pass the urology fellowship examination? Artificial intelligence capability in surgical training assessment.

作者信息

Lockhart Kathleen, Canagasingham Ashan, Zhong Wenjie, Ashrafi Darius, March Brayden, Cole-Clark Dane, Grant Alice, Chung Amanda

机构信息

Royal North Shore Hospital, St Leonards, New South Wales, Australia.

Wollongong Hospital, Wollongong, New South Wales, Australia.

出版信息

BJU Int. 2025 Sep;136(3):523-528. doi: 10.1111/bju.16806. Epub 2025 Jun 19.

DOI:10.1111/bju.16806
PMID:40538057
Abstract

OBJECTIVES

To assess the performance of ChatGPT compared to human trainees in the Australian Urology written fellowship examination (essay format).

MATERIALS AND METHODS

Each examination was marked independently by two blinded examining urologists and assessed for: overall pass/failure; proportion of passing questions; and adjusted aggregate score. Examining urologists also made a blinded judgement as to authorship (artificial intelligence [AI] or trainee).

RESULTS

A total of 20 examination papers were marked; 10 completed by urology trainees and 10 by AI platforms (half each on ChatGPT-3.5 and -4.0). Overall, 9/10 of trainees successfully passed the urology fellowship, whereas only 6/10 of ChatGPT examinations passed (P = 0.3). Of the ChatGPT failing examinations, 3/4 were undertaken by the ChatGPT-3.5 platform. The proportion of passing questions per examination was higher in trainees compared to ChatGPT: mean 89.4% vs 80.9% (P = 0.2). The adjusted aggregate scores of trainees were also higher than those of ChatGPT by a small margin: mean 79.2% vs 78.1% (P = 0.8). ChatGPT-3.5 and ChatGPT-4.0 achieved similar aggregate scores (78.9% and 77.4%, P = 0.8). However, ChatGPT-3.5 had a lower percentage of passing questions per examination: mean 79.6% vs 82.1% (P = 0.8). Two examinations were incorrectly assigned by examining urologists (both trainee candidates perceived to be ChatGPT); therefore, the sensitivity for identifying ChatGPT authorship was 100% and overall accuracy was 91.7%.

CONCLUSION

Overall, ChatGPT did not perform as well as human trainees in the Australian Urology fellowship written examination. Examiners were able to identify AI-generated answers with a high degree of accuracy.

摘要

目的

在澳大利亚泌尿外科书面专科医师考试(论文形式)中,评估ChatGPT与人类受训者的表现。

材料与方法

每次考试由两名不知情的泌尿外科考官独立评分,并评估以下内容:总体及格/不及格情况;及格问题的比例;以及调整后的总分。泌尿外科考官还对作者身份(人工智能[AI]或受训者)进行了不知情判断。

结果

共评阅了20份试卷;10份由泌尿外科受训者完成,10份由AI平台完成(ChatGPT-3.5和-4.0各占一半)。总体而言,10名受训者中有9人成功通过了泌尿外科专科医师考试,而ChatGPT考试只有6人通过(P = 0.3)。在ChatGPT未通过的考试中,3/4是由ChatGPT-3.5平台进行的。与ChatGPT相比,受训者每次考试及格问题的比例更高:平均为89.4%对80.9%(P = 0.2)。受训者的调整后总分也略高于ChatGPT:平均为79.2%对78.1%(P = 0.8)。ChatGPT-3.5和ChatGPT-4.0的总分相似(78.9%和77.4%,P = 0.8)。然而,ChatGPT-3.5每次考试及格问题的百分比更低:平均为79.6%对82.1%(P = 0.8)。两名泌尿外科考官错误地分配了两份试卷(两名被认为是ChatGPT的受训者考生);因此,识别ChatGPT作者身份的敏感性为100%,总体准确率为91.7%。

结论

总体而言,在澳大利亚泌尿外科专科医师书面考试中,ChatGPT的表现不如人类受训者。考官能够高度准确地识别由AI生成的答案。

相似文献

1
Can ChatGPT pass the urology fellowship examination? Artificial intelligence capability in surgical training assessment.ChatGPT能通过泌尿外科专科培训考试吗?手术培训评估中的人工智能能力。
BJU Int. 2025 Sep;136(3):523-528. doi: 10.1111/bju.16806. Epub 2025 Jun 19.
2
Clinical Performance and Communication Skills of ChatGPT Versus Physicians in Emergency Medicine: Simulated Patient Study.ChatGPT与急诊医学医生的临床表现及沟通技巧:模拟患者研究
JMIR Med Inform. 2025 Jul 17;13:e68409. doi: 10.2196/68409.
3
The Growing Role of Artificial Intelligence in Surgical Education: ChatGPT Undertakes the Australian Generic Surgical Sciences Examination.人工智能在外科教育中日益重要的作用:ChatGPT参加澳大利亚普通外科科学考试。
ANZ J Surg. 2025 Jul-Aug;95(7-8):1350-1355. doi: 10.1111/ans.70186. Epub 2025 May 30.
4
Artificial intelligence in radiology examinations: a psychometric comparison of question generation methods.放射学检查中的人工智能:问题生成方法的心理测量学比较
Diagn Interv Radiol. 2025 Jul 21. doi: 10.4274/dir.2025.253407.
5
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
6
The development of surgical ability during pediatric urology fellowship and its evolution in the early years of practice.小儿泌尿科住院医师培训期间外科能力的发展及其在早期实践中的演变。
J Pediatr Urol. 2024 Dec;20(6):1035-1043. doi: 10.1016/j.jpurol.2024.08.013. Epub 2024 Aug 31.
7
Utility of Generative Artificial Intelligence for Japanese Medical Interview Training: Randomized Crossover Pilot Study.生成式人工智能在日本医学面试培训中的效用:随机交叉试点研究。
JMIR Med Educ. 2025 Aug 1;11:e77332. doi: 10.2196/77332.
8
Performance of ChatGPT-4 Omni and Gemini 1.5 Pro on Ophthalmology-Related Questions in the Turkish Medical Specialty Exam.ChatGPT-4 Omni和Gemini 1.5 Pro在土耳其医学专业考试中与眼科相关问题上的表现。
Turk J Ophthalmol. 2025 Aug 21;55(4):177-185. doi: 10.4274/tjo.galenos.2025.27895.
9
Battle of the artificial intelligence: a comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions.人工智能之战:针对尿失禁相关问题对DeepSeek和ChatGPT的全面比较分析
Front Public Health. 2025 Jul 23;13:1605908. doi: 10.3389/fpubh.2025.1605908. eCollection 2025.
10
Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis.评估 ChatGPT 和 Bard 生成的结构化摘要与脊柱外科领域人类撰写的摘要在可重复性方面的比较:对比分析。
J Med Internet Res. 2024 Jun 26;26:e52001. doi: 10.2196/52001.