文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

基于 2377 个美国医师执照考试(USMLE)第 1 步风格问题题干中的特定信号词和短语,深入分析 ChatGPT 的表现。

In-depth analysis of ChatGPT's performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions.

机构信息

Department of Oral and Maxillofacial Surgery, Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität Zu Berlin, and Berlin Institute of Health, Berlin, Germany.

Department of Plastic Surgery and Hand Surgery, Klinikum Rechts Der Isar, Technical University of Munich, Munich, Germany.

出版信息

Sci Rep. 2024 Jun 12;14(1):13553. doi: 10.1038/s41598-024-63997-7.


DOI:10.1038/s41598-024-63997-7
PMID:38866891
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11169536/
Abstract

ChatGPT has garnered attention as a multifaceted AI chatbot with potential applications in medicine. Despite intriguing preliminary findings in areas such as clinical management and patient education, there remains a substantial knowledge gap in comprehensively understanding the chances and limitations of ChatGPT's capabilities, especially in medical test-taking and education. A total of n = 2,729 USMLE Step 1 practice questions were extracted from the Amboss question bank. After excluding 352 image-based questions, a total of 2,377 text-based questions were further categorized and entered manually into ChatGPT, and its responses were recorded. ChatGPT's overall performance was analyzed based on question difficulty, category, and content with regards to specific signal words and phrases. ChatGPT achieved an overall accuracy rate of 55.8% in a total number of n = 2,377 USMLE Step 1 preparation questions obtained from the Amboss online question bank. It demonstrated a significant inverse correlation between question difficulty and performance with r = -0.306; p < 0.001, maintaining comparable accuracy to the human user peer group across different levels of question difficulty. Notably, ChatGPT outperformed in serology-related questions (61.1% vs. 53.8%; p = 0.005) but struggled with ECG-related content (42.9% vs. 55.6%; p = 0.021). ChatGPT achieved statistically significant worse performances in pathophysiology-related question stems. (Signal phrase = "what is the most likely/probable cause"). ChatGPT performed consistent across various question categories and difficulty levels. These findings emphasize the need for further investigations to explore the potential and limitations of ChatGPT in medical examination and education.

摘要

ChatGPT 作为一款多功能 AI 聊天机器人,在医学领域具有潜在应用,备受关注。尽管在临床管理和患者教育等领域有一些有趣的初步发现,但我们对全面了解 ChatGPT 能力的机会和限制仍存在很大的知识差距,尤其是在医学考试和教育方面。从 Amboss 题库中提取了总共 n = 2729 个 USMLE Step 1 练习题。排除 352 个基于图像的问题后,总共 2377 个基于文本的问题进一步进行分类并手动输入到 ChatGPT 中,并记录其回答。根据问题难度、类别以及特定信号词和短语的内容,分析了 ChatGPT 的整体表现。ChatGPT 在总共 n = 2377 个从 Amboss 在线题库获得的 USMLE Step 1 备考问题中的整体准确率为 55.8%。它表现出问题难度与表现之间存在显著的负相关关系,r = -0.306;p < 0.001,在不同难度级别的问题中,与人类用户的准确率相当。值得注意的是,ChatGPT 在血清学相关问题上表现更好(61.1%比 53.8%;p = 0.005),但在心电图相关内容方面表现不佳(42.9%比 55.6%;p = 0.021)。ChatGPT 在与病理生理学相关的问题中表现出显著更差的性能(信号短语=“最有可能/最可能的原因是什么”)。ChatGPT 在各种问题类别和难度级别上的表现一致。这些发现强调了需要进一步研究以探索 ChatGPT 在医学考试和教育中的潜力和限制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/daac16370527/41598_2024_63997_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/d8491bc4ca4b/41598_2024_63997_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/59b19ac3d4f6/41598_2024_63997_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/4b3cb9ae520d/41598_2024_63997_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/4f9f9c27ce3f/41598_2024_63997_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/daac16370527/41598_2024_63997_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/d8491bc4ca4b/41598_2024_63997_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/59b19ac3d4f6/41598_2024_63997_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/4b3cb9ae520d/41598_2024_63997_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/4f9f9c27ce3f/41598_2024_63997_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c461/11169536/daac16370527/41598_2024_63997_Fig5_HTML.jpg

相似文献

[1]
In-depth analysis of ChatGPT's performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions.

Sci Rep. 2024-6-12

[2]
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.

JMIR Med Educ. 2024-1-5

[3]
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023-2-8

[4]
Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations.

Ann Biomed Eng. 2024-6

[5]
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.

JMIR Med Educ. 2024-4-29

[6]
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.

Front Med (Lausanne). 2023-12-13

[7]
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.

Int J Nurs Stud. 2024-5

[8]
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024-2-9

[9]
Assessing ChatGPT 4.0's test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports.

Sci Rep. 2024-4-23

[10]
Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.

Sci Rep. 2023-10-1

引用本文的文献

[1]
Performance of ChatGPT in answering the oral pathology questions of various types or subjects from Taiwan National Dental Licensing Examinations.

J Dent Sci. 2025-7

[2]
Quantum leap in medical mentorship: exploring ChatGPT's transition from textbooks to terabytes.

Front Med (Lausanne). 2025-4-28

[3]
Harnessing advanced large language models in otolaryngology board examinations: an investigation using python and application programming interfaces.

Eur Arch Otorhinolaryngol. 2025-4-25

[4]
Precision Oncology in Non-small Cell Lung Cancer: A Comparative Study of Contextualized ChatGPT Models.

Cureus. 2025-3-24

[5]
Analyzing Question Characteristics Influencing ChatGPT's Performance in 3000 USMLE®-Style Questions.

Med Sci Educ. 2024-9-28

[6]
Advancements in AI Medical Education: Assessing ChatGPT's Performance on USMLE-Style Questions Across Topics and Difficulty Levels.

Cureus. 2024-12-24

[7]
Understanding AI's Role in Endometriosis Patient Education and Evaluating Its Information and Accuracy: Systematic Review.

JMIR AI. 2024-10-30

[8]
Assessment Study of ChatGPT-3.5's Performance on the Final Polish Medical Examination: Accuracy in Answering 980 Questions.

Healthcare (Basel). 2024-8-16

本文引用的文献

[1]
GPT-4 passes the bar exam.

Philos Trans A Math Phys Eng Sci. 2024-4-15

[2]
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.

JMIR Med Educ. 2024-1-5

[3]
Diagnosing lagophthalmos using artificial intelligence.

Sci Rep. 2023-12-8

[4]
Examining ChatGPT Performance on USMLE Sample Items and Implications for Assessment.

Acad Med. 2024-2-1

[5]
Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study.

J Educ Eval Health Prof. 2023

[6]
The Importance of Research Experience With a Scoreless Step 1: A Student Survey at a Community-Based Medical School.

Cureus. 2023-8-14

[7]
Advancing Patient Care: How Artificial Intelligence Is Transforming Healthcare.

J Pers Med. 2023-7-31

[8]
Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations.

Ann Biomed Eng. 2024-6

[9]
ChatGPT Passes German State Examination in Medicine With Picture Questions Omitted.

Dtsch Arztebl Int. 2023-5-30

[10]
The importance of USMLE step 2 on the screening and selection of applicants for general surgery residency positions.

Heliyon. 2023-6-27

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索