• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Examining ChatGPT Performance on USMLE Sample Items and Implications for Assessment.考察 ChatGPT 在 USMLE 样题上的表现及对评估的启示
Acad Med. 2024 Feb 1;99(2):192-197. doi: 10.1097/ACM.0000000000005549. Epub 2023 Nov 7.
2
ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination.ChatGPT-4:美国医师执照考试中人工智能聊天机器人的升级评估。
Med Teach. 2024 Mar;46(3):366-372. doi: 10.1080/0142159X.2023.2249588. Epub 2023 Oct 15.
3
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.纯粹的智慧还是虚假的村庄?对 USMLE Step 3 题型的 ChatGPT 3.5 和 ChatGPT 4 的比较:定量分析。
JMIR Med Educ. 2024 Jan 5;10:e51148. doi: 10.2196/51148.
4
Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study.ChatGPT 在不同考试级别的眼科相关问题上的表现:观察性研究。
JMIR Med Educ. 2024 Jan 18;10:e50842. doi: 10.2196/50842.
5
Advancements in AI Medical Education: Assessing ChatGPT's Performance on USMLE-Style Questions Across Topics and Difficulty Levels.人工智能医学教育的进展:评估ChatGPT在不同主题和难度级别的美国医师执照考试(USMLE)风格问题上的表现。
Cureus. 2024 Dec 24;16(12):e76309. doi: 10.7759/cureus.76309. eCollection 2024 Dec.
6
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
7
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
8
ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现:比较分析。
JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.
9
Analyzing Question Characteristics Influencing ChatGPT's Performance in 3000 USMLE®-Style Questions.分析影响ChatGPT在3000道美国医师执照考试(USMLE®)风格题目中表现的问题特征
Med Sci Educ. 2024 Sep 28;35(1):257-267. doi: 10.1007/s40670-024-02176-9. eCollection 2025 Feb.
10
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

引用本文的文献

1
Context Matching is not Reasoning: Assessing Generalized Evaluation of Generative Language Models in Clinical Settings.上下文匹配并非推理:评估生成式语言模型在临床环境中的广义评估
Res Sq. 2025 Aug 29:rs.3.rs-7325383. doi: 10.21203/rs.3.rs-7325383/v1.
2
The performance of ChatGPT on medical image-based assessments and implications for medical education.ChatGPT在基于医学图像的评估中的表现及其对医学教育的影响。
BMC Med Educ. 2025 Aug 23;25(1):1192. doi: 10.1186/s12909-025-07752-0.
3
Improving Patient Communication by Simplifying AI-Generated Dental Radiology Reports With ChatGPT: Comparative Study.通过使用ChatGPT简化人工智能生成的牙科放射学报告来改善患者沟通:比较研究
J Med Internet Res. 2025 Jun 9;27:e73337. doi: 10.2196/73337.
4
Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE.语义临床人工智能与原生大语言模型在美国医师执照考试中的表现对比
JAMA Netw Open. 2025 Apr 1;8(4):e256359. doi: 10.1001/jamanetworkopen.2025.6359.
5
ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.医学教育中的ChatGPT及其他大语言模型——文献综述
Med Sci Educ. 2024 Nov 13;35(1):555-567. doi: 10.1007/s40670-024-02206-6. eCollection 2025 Feb.
6
Evaluating the Accuracy of ChatGPT in the Japanese Board-Certified Physiatrist Examination.评估ChatGPT在日本物理治疗师资格考试中的准确性。
Cureus. 2024 Dec 22;16(12):e76214. doi: 10.7759/cureus.76214. eCollection 2024 Dec.
7
ChatGPT's Attitude, Knowledge, and Clinical Application in Geriatrics Practice and Education: Exploratory Observational Study.ChatGPT在老年医学实践与教育中的态度、知识及临床应用:探索性观察研究
JMIR Form Res. 2025 Jan 3;9:e63494. doi: 10.2196/63494.
8
AI in Dental Radiology-Improving the Efficiency of Reporting With ChatGPT: Comparative Study.牙科放射学中的人工智能——利用ChatGPT提高报告效率:比较研究
J Med Internet Res. 2024 Dec 23;26:e60684. doi: 10.2196/60684.
9
ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现:比较分析。
JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.
10
Evaluating the Accuracy of Large Language Model (ChatGPT) in Providing Information on Metastatic Breast Cancer.评估大语言模型(ChatGPT)在提供转移性乳腺癌信息方面的准确性。
Adv Pharm Bull. 2024 Oct;14(3):499-503. doi: 10.34172/apb.2024.060. Epub 2024 Jul 31.

本文引用的文献

1
Can large language models reason about medical questions?大型语言模型能对医学问题进行推理吗?
Patterns (N Y). 2024 Mar 1;5(3):100943. doi: 10.1016/j.patter.2024.100943. eCollection 2024 Mar 8.
2
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
3
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.GPT-4作为医学人工智能聊天机器人的益处、局限性和风险
N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.
4
Artificial Intelligence in Medicine.医学中的人工智能
N Engl J Med. 2023 Mar 30;388(13):1220-1221. doi: 10.1056/NEJMe2206291.
5
Artificial Intelligence and Machine Learning in Clinical Medicine, 2023.临床医学中的人工智能与机器学习,2023年。
N Engl J Med. 2023 Mar 30;388(13):1201-1208. doi: 10.1056/NEJMra2302038.
6
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
7
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

考察 ChatGPT 在 USMLE 样题上的表现及对评估的启示

Examining ChatGPT Performance on USMLE Sample Items and Implications for Assessment.

出版信息

Acad Med. 2024 Feb 1;99(2):192-197. doi: 10.1097/ACM.0000000000005549. Epub 2023 Nov 7.

DOI:10.1097/ACM.0000000000005549
PMID:37934828
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11444356/
Abstract

PURPOSE

In late 2022 and early 2023, reports that ChatGPT could pass the United States Medical Licensing Examination (USMLE) generated considerable excitement, and media response suggested ChatGPT has credible medical knowledge. This report analyzes the extent to which an artificial intelligence (AI) agent's performance on these sample items can generalize to performance on an actual USMLE examination and an illustration is given using ChatGPT.

METHOD

As with earlier investigations, analyses were based on publicly available USMLE sample items. Each item was submitted to ChatGPT (version 3.5) 3 times to evaluate stability. Responses were scored following rules that match operational practice, and a preliminary analysis explored the characteristics of items that ChatGPT answered correctly. The study was conducted between February and March 2023.

RESULTS

For the full sample of items, ChatGPT scored above 60% correct except for one replication for Step 3. Response success varied across replications for 76 items (20%). There was a modest correspondence with item difficulty wherein ChatGPT was more likely to respond correctly to items found easier by examinees. ChatGPT performed significantly worse ( P < .001) on items relating to practice-based learning.

CONCLUSIONS

Achieving 60% accuracy is an approximate indicator of meeting the passing standard, requiring statistical adjustments for comparison. Hence, this assessment can only suggest consistency with the passing standards for Steps 1 and 2 Clinical Knowledge, with further limitations in extrapolating this inference to Step 3. These limitations are due to variances in item difficulty and exclusion of the simulation component of Step 3 from the evaluation-limitations that would apply to any AI system evaluated on the Step 3 sample items. It is crucial to note that responses from large language models exhibit notable variations when faced with repeated inquiries, underscoring the need for expert validation to ensure their utility as a learning tool.

摘要

目的

2022 年末至 2023 年初,有关 ChatGPT 可以通过美国医师执照考试(USMLE)的报道引起了极大的关注,媒体的反应表明 ChatGPT 具有可信的医学知识。本报告分析了人工智能(AI)代理在这些样本项目上的表现能够在多大程度上推广到实际 USMLE 考试的表现,并以 ChatGPT 为例进行了说明。

方法

与早期的研究一样,分析基于公开的 USMLE 样本项目。每个项目都向 ChatGPT(版本 3.5)提交 3 次,以评估其稳定性。根据与实际操作相匹配的规则对回复进行评分,初步分析探讨了 ChatGPT 回答正确的项目的特征。该研究于 2023 年 2 月至 3 月进行。

结果

对于整个项目样本,ChatGPT 的正确率除了第三次测试的 Step 3 外均高于 60%。对于 76 个项目(20%),回复的成功率在不同的重复测试中有所不同。ChatGPT 的回复成功率与项目难度之间存在一定的对应关系,即它更有可能回答考生认为更容易的项目。ChatGPT 在与基于实践的学习相关的项目上表现明显更差(P<.001)。

结论

达到 60%的准确率是符合及格标准的近似指标,需要进行统计调整才能进行比较。因此,这种评估只能表明在 Step 1 和 2 临床知识方面符合及格标准,而在推断到 Step 3 时存在进一步的限制。这些限制是由于项目难度的差异以及 Step 3 的模拟部分未包含在评估中的限制所致,这些限制也适用于在 Step 3 样本项目上评估的任何 AI 系统。需要注意的是,大型语言模型在面对重复查询时会表现出明显的差异,这凸显了需要专家验证来确保它们作为学习工具的实用性。