• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估ChatGPT在医学伦理决策中的表现:一项基于美国医师执照考试情景的比较研究。

Assessing the performance of ChatGPT in medical ethical decision-making: a comparative study with USMLE-based scenarios.

作者信息

Khan Ali A, Khan Ali R, Munshi Saminah, Dandapani Hari, Jimale Mohamed, Bogni Franck M, Khawaja Hussain

机构信息

Warren Alpert Medical School, Brown University, Providence, Rhode Island, USA

The University of Texas Medical Branch at Galveston, Galveston, Texas, USA.

出版信息

J Med Ethics. 2025 Jan 25. doi: 10.1136/jme-2024-110240.

DOI:10.1136/jme-2024-110240
PMID:39863417
Abstract

INTRODUCTION

The integration of artificial intelligence (AI) into healthcare introduces innovative possibilities but raises ethical, legal and professional concerns. Assessing the performance of AI in core components of the United States Medical Licensing Examination (USMLE), such as communication skills, ethics, empathy and professionalism, is crucial. This study evaluates how well ChatGPT versions 3.5 and 4.0 handle complex medical scenarios using USMLE-Rx, AMBOSS and UWorld question banks, aiming to understand its ability to navigate patient interactions according to medical ethics and standards.

METHODS

We compiled 273 questions from AMBOSS, USMLE-Rx and UWorld, focusing on communication, social sciences, healthcare policy and ethics. GPT-3.5 and GPT-4 were tasked with answering and justifying their choices in new chat sessions to minimise model interference. Responses were compared against question bank rationales and average student performance to evaluate AI effectiveness in medical ethical decision-making.

RESULTS

GPT-3.5 answered 38.9% correctly in AMBOSS, 54.1% in USMLE-Rx and 57.4% in UWorld, with rationale accuracy rates of 83.3%, 90.0% and 87.0%, respectively. GPT-4 answered 75.9% correctly in AMBOSS, 64.9% in USMLE-Rx and 79.6% in UWorld, with rationale accuracy rates of 85.4%, 88.9%, and 98.8%, respectively. Both versions generally scored below average student performance, except GPT-4 in UWorld.

CONCLUSION

ChatGPT, particularly version 4.0, shows potential in navigating ethical and interpersonal medical scenarios. However, human reasoning currently surpasses AI in average performance. Continued development and training of AI systems can enhance proficiency in these critical healthcare aspects.

摘要

引言

将人工智能(AI)整合到医疗保健领域带来了创新的可能性,但也引发了伦理、法律和专业方面的担忧。评估人工智能在美国医师执照考试(USMLE)核心组成部分中的表现,如沟通技巧、伦理、同理心和专业素养,至关重要。本研究使用USMLE-Rx、AMBOSS和UWorld题库评估ChatGPT 3.5版和4.0版处理复杂医疗场景的能力,旨在了解其根据医学伦理和标准进行患者互动的能力。

方法

我们从AMBOSS、USMLE-Rx和UWorld中收集了273个问题,重点关注沟通、社会科学、医疗政策和伦理。要求GPT-3.5和GPT-4在新的聊天会话中回答问题并为其选择提供理由,以尽量减少模型干扰。将回答与题库原理和学生平均表现进行比较,以评估人工智能在医学伦理决策中的有效性。

结果

GPT-3.5在AMBOSS中的正确回答率为38.9%,在USMLE-Rx中为54.1%,在UWorld中为57.4%,其原理准确率分别为83.3%、90.0%和87.0%。GPT-4在AMBOSS中的正确回答率为75.9%,在USMLE-Rx中为64.9%,在UWorld中为79.6%,其原理准确率分别为85.4%、88.9%和98.8%。除了GPT-4在UWorld中的表现外,两个版本的得分总体上均低于学生平均水平。

结论

ChatGPT,特别是4.0版在处理伦理和人际医疗场景方面显示出潜力。然而,目前人类推理在平均表现上超过了人工智能。人工智能系统的持续开发和训练可以提高在这些关键医疗保健方面的熟练程度。

相似文献

1
Assessing the performance of ChatGPT in medical ethical decision-making: a comparative study with USMLE-based scenarios.评估ChatGPT在医学伦理决策中的表现:一项基于美国医师执照考试情景的比较研究。
J Med Ethics. 2025 Jan 25. doi: 10.1136/jme-2024-110240.
2
Advancements in AI Medical Education: Assessing ChatGPT's Performance on USMLE-Style Questions Across Topics and Difficulty Levels.人工智能医学教育的进展:评估ChatGPT在不同主题和难度级别的美国医师执照考试(USMLE)风格问题上的表现。
Cureus. 2024 Dec 24;16(12):e76309. doi: 10.7759/cureus.76309. eCollection 2024 Dec.
3
Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.比较 ChatGPT 和 GPT-4 在 USMLE 软技能评估中的表现。
Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.
4
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
5
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.纯粹的智慧还是虚假的村庄?对 USMLE Step 3 题型的 ChatGPT 3.5 和 ChatGPT 4 的比较:定量分析。
JMIR Med Educ. 2024 Jan 5;10:e51148. doi: 10.2196/51148.
6
Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study.ChatGPT 在不同考试级别的眼科相关问题上的表现:观察性研究。
JMIR Med Educ. 2024 Jan 18;10:e50842. doi: 10.2196/50842.
7
ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现:比较分析。
JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.
8
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
9
ChatGPT Performs Worse on USMLE-Style Ethics Questions Compared to Medical Knowledge Questions.与医学知识问题相比,ChatGPT在USMLE风格的伦理问题上表现更差。
Appl Clin Inform. 2024 Oct;15(5):1049-1055. doi: 10.1055/a-2405-0138. Epub 2024 Aug 29.
10
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响:比较案例研究
JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.

引用本文的文献

1
Ethical and Legal Governance of Generative AI in Chinese Healthcare.中国医疗保健领域生成式人工智能的伦理与法律治理
J Multidiscip Healthc. 2025 Sep 1;18:5405-5419. doi: 10.2147/JMDH.S541271. eCollection 2025.
2
Global Health care Professionals' Perceptions of Large Language Model Use In Practice: Cross-Sectional Survey Study.全球医疗保健专业人员对大语言模型在实践中的使用认知:横断面调查研究
JMIR Med Educ. 2025 May 12;11:e58801. doi: 10.2196/58801.