• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

比较 ChatGPT 和 GPT-4 在 USMLE 软技能评估中的表现。

Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.

机构信息

Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel.

Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel.

出版信息

Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.

DOI:10.1038/s41598-023-43436-9
PMID:37779171
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10543445/
Abstract

The United States Medical Licensing Examination (USMLE) has been a subject of performance study for artificial intelligence (AI) models. However, their performance on questions involving USMLE soft skills remains unexplored. This study aimed to evaluate ChatGPT and GPT-4 on USMLE questions involving communication skills, ethics, empathy, and professionalism. We used 80 USMLE-style questions involving soft skills, taken from the USMLE website and the AMBOSS question bank. A follow-up query was used to assess the models' consistency. The performance of the AI models was compared to that of previous AMBOSS users. GPT-4 outperformed ChatGPT, correctly answering 90% compared to ChatGPT's 62.5%. GPT-4 showed more confidence, not revising any responses, while ChatGPT modified its original answers 82.5% of the time. The performance of GPT-4 was higher than that of AMBOSS's past users. Both AI models, notably GPT-4, showed capacity for empathy, indicating AI's potential to meet the complex interpersonal, ethical, and professional demands intrinsic to the practice of medicine.

摘要

美国医师执照考试(USMLE)一直是人工智能(AI)模型的性能研究课题。然而,它们在涉及 USMLE 软技能的问题上的表现仍未得到探索。本研究旨在评估 ChatGPT 和 GPT-4 在涉及沟通技巧、伦理、同理心和专业精神的 USMLE 问题上的表现。我们使用了 80 个来自 USMLE 网站和 AMBOSS 题库的涉及软技能的 USMLE 风格问题。后续查询用于评估模型的一致性。AI 模型的性能与之前的 AMBOSS 用户进行了比较。GPT-4 的表现优于 ChatGPT,正确回答了 90%的问题,而 ChatGPT 的正确回答率为 62.5%。GPT-4 表现出更高的信心,没有修改任何回答,而 ChatGPT 有 82.5%的时间修改了其原始回答。GPT-4 的表现高于 AMBOSS 过去用户的表现。这两个 AI 模型,特别是 GPT-4,都表现出了同理心的能力,这表明 AI 有潜力满足医学实践中固有的复杂的人际、伦理和专业要求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af26/10543445/903bd0cd1bf7/41598_2023_43436_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af26/10543445/903bd0cd1bf7/41598_2023_43436_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af26/10543445/903bd0cd1bf7/41598_2023_43436_Fig1_HTML.jpg

相似文献

1
Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.比较 ChatGPT 和 GPT-4 在 USMLE 软技能评估中的表现。
Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.
2
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
3
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.纯粹的智慧还是虚假的村庄?对 USMLE Step 3 题型的 ChatGPT 3.5 和 ChatGPT 4 的比较:定量分析。
JMIR Med Educ. 2024 Jan 5;10:e51148. doi: 10.2196/51148.
4
Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study.ChatGPT 在不同考试级别的眼科相关问题上的表现:观察性研究。
JMIR Med Educ. 2024 Jan 18;10:e50842. doi: 10.2196/50842.
5
Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.ChatGPT在秘鲁国家医学执照考试中的表现:横断面研究
JMIR Med Educ. 2023 Sep 28;9:e48039. doi: 10.2196/48039.
6
ChatGPT Performs Worse on USMLE-Style Ethics Questions Compared to Medical Knowledge Questions.与医学知识问题相比,ChatGPT在USMLE风格的伦理问题上表现更差。
Appl Clin Inform. 2024 Oct;15(5):1049-1055. doi: 10.1055/a-2405-0138. Epub 2024 Aug 29.
7
Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.ChatGPT 在中文体检、病历和教育方面的表现和探索:为医疗 AI 铺平道路。
Int J Med Inform. 2023 Sep;177:105173. doi: 10.1016/j.ijmedinf.2023.105173. Epub 2023 Aug 4.
8
In-depth analysis of ChatGPT's performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions.基于 2377 个美国医师执照考试(USMLE)第 1 步风格问题题干中的特定信号词和短语,深入分析 ChatGPT 的表现。
Sci Rep. 2024 Jun 12;14(1):13553. doi: 10.1038/s41598-024-63997-7.
9
ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination.ChatGPT-4:美国医师执照考试中人工智能聊天机器人的升级评估。
Med Teach. 2024 Mar;46(3):366-372. doi: 10.1080/0142159X.2023.2249588. Epub 2023 Oct 15.
10
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

引用本文的文献

1
Rapidly Benchmarking Large Language Models for Diagnosing Comorbid Patients: Comparative Study Leveraging the LLM-as-a-Judge Method.快速对用于诊断合并症患者的大语言模型进行基准测试:利用“大语言模型即评判者”方法的比较研究
JMIRx Med. 2025 Aug 29;6:e67661. doi: 10.2196/67661.
2
Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer.评估DeepSeek、Gemini、ChatGPT-4o和Perplexity对涎腺癌的回答。
BMC Oral Health. 2025 Aug 23;25(1):1358. doi: 10.1186/s12903-025-06726-4.
3
Artificial intelligence, health empowerment, and the general practitioner scheme.

本文引用的文献

1
Use of Large Language Models to Predict Neuroimaging.大语言模型在神经影像学预测中的应用。
J Am Coll Radiol. 2023 Oct;20(10):1004-1009. doi: 10.1016/j.jacr.2023.06.008. Epub 2023 Jul 8.
2
Health system-scale language models are all-purpose prediction engines.健康系统规模的语言模型是通用的预测引擎。
Nature. 2023 Jul;619(7969):357-362. doi: 10.1038/s41586-023-06160-y. Epub 2023 Jun 7.
3
Large language model (ChatGPT) as a support tool for breast tumor board.大语言模型(ChatGPT)作为乳腺肿瘤多学科诊疗团队的辅助工具。
人工智能、健康赋权与全科医生计划。
Digit Health. 2025 Jul 29;11:20552076251365006. doi: 10.1177/20552076251365006. eCollection 2025 Jan-Dec.
4
Consumer Data is Key to Artificial Intelligence Value: Welcome to the Health Care Future.消费者数据是人工智能价值的关键:欢迎来到医疗保健的未来。
J Particip Med. 2025 Aug 1;17:e68261. doi: 10.2196/68261.
5
Dr. LLM Will See You Now: The Ability of ChatGPT to Provide Geographically Tailored Colorectal Cancer Screening and Surveillance Recommendations.大语言模型医生现在为您服务:ChatGPT提供针对不同地理位置的结直肠癌筛查和监测建议的能力。
J Clin Med. 2025 Jul 18;14(14):5101. doi: 10.3390/jcm14145101.
6
Challenges in the Rapid and Responsible Integration of Generative Artificial Intelligence (AI) Into a New Medical School Curriculum.将生成式人工智能(AI)快速且负责任地整合到新医学院课程中的挑战。
Cureus. 2025 Jun 26;17(6):e86796. doi: 10.7759/cureus.86796. eCollection 2025 Jun.
7
Can open source large language models be used for tumor documentation in Germany?-An evaluation on urological doctors' notes.在德国,开源大语言模型可用于肿瘤记录吗?——对泌尿科医生笔记的评估
BioData Min. 2025 Jul 24;18(1):48. doi: 10.1186/s13040-025-00463-8.
8
DeepSeek-R1 outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in bilingual complex ophthalmology reasoning.在双语复杂眼科推理方面,DeepSeek-R1的表现优于Gemini 2.0 Pro、OpenAI的o1和o3-mini。
Adv Ophthalmol Pract Res. 2025 May 9;5(3):189-195. doi: 10.1016/j.aopr.2025.05.001. eCollection 2025 Aug-Sep.
9
Evaluation of ChatGPT Performance on Emergency Medicine Board Examination Questions: Observational Study.ChatGPT在急诊医学委员会考试问题上的表现评估:观察性研究。
JMIR AI. 2025 Mar 12;4:e67696. doi: 10.2196/67696.
10
Evaluating the Role of Artificial Intelligence in Making Clinical Decisions for Treating Acute Pancreatitis.评估人工智能在急性胰腺炎治疗临床决策中的作用。
J Clin Med. 2025 Jun 18;14(12):4347. doi: 10.3390/jcm14124347.
NPJ Breast Cancer. 2023 May 30;9(1):44. doi: 10.1038/s41523-023-00557-8.
4
Large language models for oncological applications.用于肿瘤学应用的大型语言模型。
J Cancer Res Clin Oncol. 2023 Sep;149(11):9505-9508. doi: 10.1007/s00432-023-04824-w. Epub 2023 May 9.
5
How Chatbots and Large Language Model Artificial Intelligence Systems Will Reshape Modern Medicine: Fountain of Creativity or Pandora's Box?聊天机器人和大语言模型人工智能系统将如何重塑现代医学:创造力之源还是潘多拉魔盒?
JAMA Intern Med. 2023 Jun 1;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835.
6
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。
JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.
7
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
8
Generating scholarly content with ChatGPT: ethical challenges for medical publishing.使用ChatGPT生成学术内容:医学出版面临的伦理挑战。
Lancet Digit Health. 2023 Mar;5(3):e105-e106. doi: 10.1016/S2589-7500(23)00019-5. Epub 2023 Feb 6.
9
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
10
Evolution of Educational Commission for Foreign Medical Graduates Certification in the Absence of the USMLE Step 2 Clinical Skills Examination.在美国医师执照考试第二步临床技能考试缺失的情况下外国医学毕业生教育委员会认证的演变
Acad Med. 2023 Apr 1;98(4):444-447. doi: 10.1097/ACM.0000000000005051. Epub 2022 Oct 25.