• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现:观察性研究。
JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.
2
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
3
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
4
Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis.ChatGPT-3.5 和 GPT-4 在医学、药学、牙科和护理国家执照考试中的表现:系统评价和荟萃分析。
BMC Med Educ. 2024 Sep 16;24(1):1013. doi: 10.1186/s12909-024-05944-8.
5
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响:来自台湾护理执照考试的见解。
Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.
6
Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations.航海七海:ChatGPT 在医学执照考试中的表现的跨国比较。
Ann Biomed Eng. 2024 Jun;52(6):1542-1545. doi: 10.1007/s10439-023-03338-3. Epub 2023 Aug 8.
7
ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini.ChatGPT在葡萄牙语医学考试问题上的表现:ChatGPT-3.5 Turbo与ChatGPT-4o Mini的比较分析。
JMIR Med Educ. 2025 Mar 5;11:e65108. doi: 10.2196/65108.
8
Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.ChatGPT在秘鲁国家医学执照考试中的表现:横断面研究
JMIR Med Educ. 2023 Sep 28;9:e48039. doi: 10.2196/48039.
9
ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现:比较分析。
JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.
10
Pure Wisdom or Potemkin Villages? A Comparison of ChatGPT 3.5 and ChatGPT 4 on USMLE Step 3 Style Questions: Quantitative Analysis.纯粹的智慧还是虚假的村庄?对 USMLE Step 3 题型的 ChatGPT 3.5 和 ChatGPT 4 的比较:定量分析。
JMIR Med Educ. 2024 Jan 5;10:e51148. doi: 10.2196/51148.

引用本文的文献

1
Intra-axial primary brain tumor differentiation: comparing large language models on structured MRI reports vs. radiologists on images.轴内原发性脑肿瘤鉴别:比较基于结构化MRI报告的大语言模型与阅片放射科医生的表现
Eur Radiol. 2025 Aug 22. doi: 10.1007/s00330-025-11924-3.
2
Clinical Performance and Communication Skills of ChatGPT Versus Physicians in Emergency Medicine: Simulated Patient Study.ChatGPT与急诊医学医生的临床表现及沟通技巧:模拟患者研究
JMIR Med Inform. 2025 Jul 17;13:e68409. doi: 10.2196/68409.
3
Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.医学生、ChatGPT-3.5和ChatGPT-4.0在回答巴西国家医学考试问题中的表现比较:横断面问卷调查研究
JMIR AI. 2025 May 8;4:e66552. doi: 10.2196/66552.
4
Performance of single-agent and multi-agent language models in Spanish language medical competency exams.单智能体和多智能体语言模型在西班牙语医学能力考试中的表现。
BMC Med Educ. 2025 May 7;25(1):666. doi: 10.1186/s12909-025-07250-3.
5
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
6
Preliminary evaluation of ChatGPT model iterations in emergency department diagnostics.ChatGPT模型迭代在急诊科诊断中的初步评估。
Sci Rep. 2025 Mar 26;15(1):10426. doi: 10.1038/s41598-025-95233-1.
7
Use of artificial intelligence-generated multiple-choice questions for the examination of surgical subspecialty residents Report of feasibility and psychometric analysis.使用人工智能生成的多项选择题对外科专科住院医师进行考核:可行性及心理测量学分析报告
Can Urol Assoc J. 2025 Jun;19(6):182-187. doi: 10.5489/cuaj.9020.
8
Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study.最新大语言模型在回答牙科多项选择题方面的准确性:一项比较研究。
PLoS One. 2025 Jan 29;20(1):e0317423. doi: 10.1371/journal.pone.0317423. eCollection 2025.
9
Assessing the Current Limitations of Large Language Models in Advancing Health Care Education.评估大语言模型在推进医疗保健教育方面的当前局限性。
JMIR Form Res. 2025 Jan 16;9:e51319. doi: 10.2196/51319.
10
Clinical Characteristics of Children with Acute Post-Streptococcal Glomerulonephritis and Re-Evaluation of Patients with Artificial Intelligence.急性链球菌感染后肾小球肾炎患儿的临床特征及人工智能对患者的重新评估
Medeni Med J. 2024 Sep 30;39(3):221-229. doi: 10.4274/MMJ.galenos.2024.09382.

本文引用的文献

1
ChatGPT Conquers the Saudi Medical Licensing Exam: Exploring the Accuracy of Artificial Intelligence in Medical Knowledge Assessment and Implications for Modern Medical Education.ChatGPT攻克沙特医学执照考试:探索人工智能在医学知识评估中的准确性及其对现代医学教育的影响
Cureus. 2023 Sep 11;15(9):e45043. doi: 10.7759/cureus.45043. eCollection 2023 Sep.
2
Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany.医学教育中的人工智能:德国ChatGPT、必应与医学生的比较分析
JMIR Med Educ. 2023 Sep 4;9:e46482. doi: 10.2196/46482.
3
ChatGPT and Generative Artificial Intelligence for Medical Education: Potential Impact and Opportunity.ChatGPT 和生成式人工智能在医学教育中的应用:潜在影响与机遇。
Acad Med. 2024 Jan 1;99(1):22-27. doi: 10.1097/ACM.0000000000005439. Epub 2023 Aug 31.
4
Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: A novel approach to medical knowledge assessment.ChatGPT 人工智能语言模型成功通过欧洲眼科委员会法语考试:医学知识评估的新方法。
J Fr Ophtalmol. 2023 Sep;46(7):706-711. doi: 10.1016/j.jfo.2023.05.006. Epub 2023 Aug 1.
5
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现:观察性研究。

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.

机构信息

Graduate School of Education, Stanford University, Stanford, CA, United States.

School of Medicine, Universidad de Chile, Santiago, Chile.

出版信息

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

DOI:10.2196/55048
PMID:38686550
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11082432/
Abstract

BACKGROUND

The deployment of OpenAI's ChatGPT-3.5 and its subsequent versions, ChatGPT-4 and ChatGPT-4 With Vision (4V; also known as "GPT-4 Turbo With Vision"), has notably influenced the medical field. Having demonstrated remarkable performance in medical examinations globally, these models show potential for educational applications. However, their effectiveness in non-English contexts, particularly in Chile's medical licensing examinations-a critical step for medical practitioners in Chile-is less explored. This gap highlights the need to evaluate ChatGPT's adaptability to diverse linguistic and cultural contexts.

OBJECTIVE

This study aims to evaluate the performance of ChatGPT versions 3.5, 4, and 4V in the EUNACOM (Examen Único Nacional de Conocimientos de Medicina), a major medical examination in Chile.

METHODS

Three official practice drills (540 questions) from the University of Chile, mirroring the EUNACOM's structure and difficulty, were used to test ChatGPT versions 3.5, 4, and 4V. The 3 ChatGPT versions were provided 3 attempts for each drill. Responses to questions during each attempt were systematically categorized and analyzed to assess their accuracy rate.

RESULTS

All versions of ChatGPT passed the EUNACOM drills. Specifically, versions 4 and 4V outperformed version 3.5, achieving average accuracy rates of 79.32% and 78.83%, respectively, compared to 57.53% for version 3.5 (P<.001). Version 4V, however, did not outperform version 4 (P=.73), despite the additional visual capabilities. We also evaluated ChatGPT's performance in different medical areas of the EUNACOM and found that versions 4 and 4V consistently outperformed version 3.5. Across the different medical areas, version 3.5 displayed the highest accuracy in psychiatry (69.84%), while versions 4 and 4V achieved the highest accuracy in surgery (90.00% and 86.11%, respectively). Versions 3.5 and 4 had the lowest performance in internal medicine (52.74% and 75.62%, respectively), while version 4V had the lowest performance in public health (74.07%).

CONCLUSIONS

This study reveals ChatGPT's ability to pass the EUNACOM, with distinct proficiencies across versions 3.5, 4, and 4V. Notably, advancements in artificial intelligence (AI) have not significantly led to enhancements in performance on image-based questions. The variations in proficiency across medical fields suggest the need for more nuanced AI training. Additionally, the study underscores the importance of exploring innovative approaches to using AI to augment human cognition and enhance the learning process. Such advancements have the potential to significantly influence medical education, fostering not only knowledge acquisition but also the development of critical thinking and problem-solving skills among health care professionals.

摘要

背景

OpenAI 的 ChatGPT-3.5 及其后续版本,ChatGPT-4 和 ChatGPT-4 With Vision(4V;也称为“GPT-4 Turbo With Vision”)的推出,显著影响了医疗领域。这些模型在全球医学考试中表现出色,具有教育应用的潜力。然而,它们在非英语环境中的有效性,特别是在智利医学执照考试中的表现——这是智利医生的关键步骤——尚未得到充分探索。这一差距凸显了评估 ChatGPT 适应不同语言和文化环境能力的必要性。

目的

本研究旨在评估 ChatGPT 版本 3.5、4 和 4V 在智利主要医学考试 EUNACOM(智利全国医学知识统考)中的表现。

方法

使用智利大学的三个官方模拟练习(540 个问题),模拟 EUNACOM 的结构和难度,对 ChatGPT 版本 3.5、4 和 4V 进行测试。为每个练习提供了 3 次尝试,每个尝试中 ChatGPT 版本的回答都被系统地分类和分析,以评估其准确率。

结果

所有版本的 ChatGPT 都通过了 EUNACOM 练习。具体来说,版本 4 和 4V 优于版本 3.5,平均准确率分别为 79.32%和 78.83%,而版本 3.5 的准确率为 57.53%(P<.001)。然而,尽管增加了视觉功能,版本 4V 并没有优于版本 4(P=.73)。我们还评估了 ChatGPT 在 EUNACOM 不同医学领域的表现,发现版本 4 和 4V 始终优于版本 3.5。在不同的医学领域中,版本 3.5 在精神病学方面表现出最高的准确率(69.84%),而版本 4 和 4V 在外科方面表现出最高的准确率(分别为 90.00%和 86.11%)。版本 3.5 和 4 在内科方面的表现最低(分别为 52.74%和 75.62%),而版本 4V 在公共卫生方面的表现最低(74.07%)。

结论

本研究揭示了 ChatGPT 通过 EUNACOM 的能力,版本 3.5、4 和 4V 具有不同的优势。值得注意的是,人工智能(AI)的进步并没有显著提高其在基于图像问题上的表现。不同版本在医学领域的表现差异表明需要更细致的 AI 培训。此外,该研究强调了探索使用 AI 增强人类认知和增强学习过程的创新方法的重要性。这些进展有可能对医学教育产生重大影响,不仅促进知识获取,而且还促进医疗保健专业人员的批判性思维和解决问题能力的发展。