• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT-4o在日本医师执照考试中的表现:纯文本和基于图像问题的准确性评估。

Performance of ChatGPT-4o on the Japanese Medical Licensing Examination: Evalution of Accuracy in Text-Only and Image-Based Questions.

作者信息

Miyazaki Yuki, Hata Masahiro, Omori Hisaki, Hirashima Atsuya, Nakagawa Yuta, Eto Mitsuhiro, Takahashi Shun, Ikeda Manabu

机构信息

Department of Psychiatry, Osaka University Graduate School of Medicine, Suita, Japan.

Department of Psychiatry, Shichiyama Hospital, Sennan District, Japan.

出版信息

JMIR Med Educ. 2024 Dec 24;10:e63129. doi: 10.2196/63129.

DOI:10.2196/63129
PMID:39718557
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11687171/
Abstract

This study evaluated the performance of ChatGPT with GPT-4 Omni (GPT-4o) on the 118th Japanese Medical Licensing Examination. The study focused on both text-only and image-based questions. The model demonstrated a high level of accuracy overall, with no significant difference in performance between text-only and image-based questions. Common errors included clinical judgment mistakes and prioritization issues, underscoring the need for further improvement in the integration of artificial intelligence into medical education and practice.

摘要

本研究评估了ChatGPT与GPT-4 Omni(GPT-4o)在第118次日本医师执照考试中的表现。该研究聚焦于纯文本问题和基于图像的问题。该模型总体表现出较高的准确性,纯文本问题和基于图像的问题在性能上没有显著差异。常见错误包括临床判断失误和优先级问题,这凸显了在将人工智能融入医学教育和实践方面进一步改进的必要性。

相似文献

1
Performance of ChatGPT-4o on the Japanese Medical Licensing Examination: Evalution of Accuracy in Text-Only and Image-Based Questions.ChatGPT-4o在日本医师执照考试中的表现:纯文本和基于图像问题的准确性评估。
JMIR Med Educ. 2024 Dec 24;10:e63129. doi: 10.2196/63129.
2
ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现:比较分析。
JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.
3
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
4
Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination.评估先进的大型语言模型在医学知识方面的有效性:使用日本国家医学考试的比较研究。
Int J Med Inform. 2025 Jan;193:105673. doi: 10.1016/j.ijmedinf.2024.105673. Epub 2024 Oct 28.
5
Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination.评估GPT-3.5、GPT-4和GPT-4o在中国国家医师资格考试中的表现。
Sci Rep. 2025 Apr 23;15(1):14119. doi: 10.1038/s41598-025-98949-2.
6
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
7
GPT-4/4V's performance on the Japanese National Medical Licensing Examination.GPT-4/4V在日本国家医师资格考试中的表现。
Med Teach. 2025 Mar;47(3):450-457. doi: 10.1080/0142159X.2024.2342545. Epub 2024 Apr 22.
8
Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study.ChatGPT 和 Bard 在医学执照考试中的表现因文化而异:一项比较研究。
BMC Med Educ. 2024 Nov 26;24(1):1372. doi: 10.1186/s12909-024-06309-x.
9
Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis.ChatGPT-3.5 和 GPT-4 在医学、药学、牙科和护理国家执照考试中的表现:系统评价和荟萃分析。
BMC Med Educ. 2024 Sep 16;24(1):1013. doi: 10.1186/s12909-024-05944-8.
10
Performance of ChatGPT-4 on Taiwanese Traditional Chinese Medicine Licensing Examinations: Cross-Sectional Study.ChatGPT-4在台湾中医师执照考试中的表现:横断面研究。
JMIR Med Educ. 2025 Mar 19;11:e58897. doi: 10.2196/58897.

引用本文的文献

1
Assessing accuracy and legitimacy of multimodal large language models on Japan Diagnostic Radiology Board Examination.评估多模态大语言模型在日本诊断放射学委员会考试中的准确性和合法性。
Jpn J Radiol. 2025 Sep 12. doi: 10.1007/s11604-025-01861-y.
2
Performance analysis of large language models Chatgpt-4o, OpenAI O1, and OpenAI O3 mini in clinical treatment of pneumonia: a comparative study.大语言模型Chatgpt-4o、OpenAI O1和OpenAI O3 mini在肺炎临床治疗中的性能分析:一项对比研究。
Clin Exp Med. 2025 Jun 20;25(1):213. doi: 10.1007/s10238-025-01743-7.
3
Evaluating performance of large language models for atrial fibrillation management using different prompting strategies and languages.使用不同的提示策略和语言评估大语言模型在房颤管理方面的性能。
Sci Rep. 2025 May 30;15(1):19028. doi: 10.1038/s41598-025-04309-5.
4
Assessment of ChatGPT's adherence to evidence-based clinical practice guidelines for plantar fasciitis management.评估ChatGPT对足底筋膜炎治疗循证临床实践指南的遵循情况。
J Orthop Surg Res. 2025 Apr 30;20(1):434. doi: 10.1186/s13018-025-05831-y.
5
Can deepseek and ChatGPT be used in the diagnosis of oral pathologies?DeepSeek和ChatGPT能用于口腔病理学诊断吗?
BMC Oral Health. 2025 Apr 25;25(1):638. doi: 10.1186/s12903-025-06034-x.
6
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.使用GPT-4o从放射学诊断印象中提取肺栓塞诊断:大语言模型评估研究
JMIR Med Inform. 2025 Apr 9;13:e67706. doi: 10.2196/67706.
7
AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination.用于医学教育的人工智能生成与人工生成的多项选择题:一项在高风险考试中的队列研究
BMC Med Educ. 2025 Feb 8;25(1):208. doi: 10.1186/s12909-025-06796-6.

本文引用的文献

1
The Performance of ChatGPT-4V in Interpreting Images and Tables in the Japanese Medical Licensing Exam.ChatGPT-4V在日本医师执照考试中对图像和表格的解读表现。
JMIR Med Educ. 2024 May 23;10:e54283. doi: 10.2196/54283.
2
Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists.将 ChatGPT GPT-4、Bard 和 Llama-2 在台湾精神科医师执照考试中的表现与多中心精神科医生的鉴别诊断进行比较。
Psychiatry Clin Neurosci. 2024 Jun;78(6):347-352. doi: 10.1111/pcn.13656. Epub 2024 Feb 26.
3
Performance of Generative Pretrained Transformer on the National Medical Licensing Examination in Japan.生成式预训练变换器在日本国家医师资格考试中的表现。
PLOS Digit Health. 2024 Jan 23;3(1):e0000433. doi: 10.1371/journal.pdig.0000433. eCollection 2024 Jan.
4
Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.ChatGPT在日本国家医师资格考试医学问题上的准确性:评估研究
JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.
5
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.GPT-3.5和GPT-4在日本医师执照考试中的表现:比较研究。
JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.
6
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.