• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GPT-4V(视觉)在日本国家医师资格考试中的能力:评估研究。

Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study.

作者信息

Nakao Takahiro, Miki Soichiro, Nakamura Yuta, Kikuchi Tomohiro, Nomura Yukihiro, Hanaoka Shouhei, Yoshikawa Takeharu, Abe Osamu

机构信息

Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Bunkyo-ku, Tokyo, Japan.

Department of Radiology, School of Medicine, Jichi Medical University, Shimotsuke, Tochigi, Japan.

出版信息

JMIR Med Educ. 2024 Mar 12;10:e54393. doi: 10.2196/54393.

DOI:10.2196/54393
PMID:38470459
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10966435/
Abstract

BACKGROUND

Previous research applying large language models (LLMs) to medicine was focused on text-based information. Recently, multimodal variants of LLMs acquired the capability of recognizing images.

OBJECTIVE

We aim to evaluate the image recognition capability of generative pretrained transformer (GPT)-4V, a recent multimodal LLM developed by OpenAI, in the medical field by testing how visual information affects its performance to answer questions in the 117th Japanese National Medical Licensing Examination.

METHODS

We focused on 108 questions that had 1 or more images as part of a question and presented GPT-4V with the same questions under two conditions: (1) with both the question text and associated images and (2) with the question text only. We then compared the difference in accuracy between the 2 conditions using the exact McNemar test.

RESULTS

Among the 108 questions with images, GPT-4V's accuracy was 68% (73/108) when presented with images and 72% (78/108) when presented without images (P=.36). For the 2 question categories, clinical and general, the accuracies with and those without images were 71% (70/98) versus 78% (76/98; P=.21) and 30% (3/10) versus 20% (2/10; P≥.99), respectively.

CONCLUSIONS

The additional information from the images did not significantly improve the performance of GPT-4V in the Japanese National Medical Licensing Examination.

摘要

背景

以往将大语言模型(LLMs)应用于医学领域的研究主要集中在基于文本的信息上。最近,大语言模型的多模态变体获得了图像识别能力。

目的

我们旨在通过测试视觉信息如何影响其在第117届日本国家医师资格考试中回答问题的表现,来评估由OpenAI开发的最新多模态大语言模型生成式预训练变换器(GPT)-4V在医学领域的图像识别能力。

方法

我们聚焦于108道包含1张或更多图像作为问题一部分的题目,并在两种条件下向GPT-4V呈现相同的问题:(1)同时呈现问题文本和相关图像;(2)仅呈现问题文本。然后,我们使用精确的麦克尼马尔检验比较两种条件下准确率的差异。

结果

在108道有图像的题目中,呈现图像时GPT-4V的准确率为68%(73/108),不呈现图像时为72%(78/108)(P = 0.36)。对于临床和综合这两类问题,有图像和无图像时的准确率分别为71%(70/98)对78%(76/98;P = 0.21)以及30%(3/10)对20%(2/10;P≥0.99)。

结论

在日本国家医师资格考试中,图像提供的额外信息并未显著提高GPT-4V的表现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5266/10966435/4dac834471c5/mededu_v10i1e54393_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5266/10966435/4dac834471c5/mededu_v10i1e54393_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5266/10966435/4dac834471c5/mededu_v10i1e54393_fig1.jpg

相似文献

1
Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study.GPT-4V(视觉)在日本国家医师资格考试中的能力:评估研究。
JMIR Med Educ. 2024 Mar 12;10:e54393. doi: 10.2196/54393.
2
GPT-4/4V's performance on the Japanese National Medical Licensing Examination.GPT-4/4V在日本国家医师资格考试中的表现。
Med Teach. 2025 Mar;47(3):450-457. doi: 10.1080/0142159X.2024.2342545. Epub 2024 Apr 22.
3
Performance of GPT-4V in Answering the Japanese Otolaryngology Board Certification Examination Questions: Evaluation Study.GPT-4V 在回答日本耳鼻喉科学委员会认证考试问题方面的表现:评估研究。
JMIR Med Educ. 2024 Mar 28;10:e57054. doi: 10.2196/57054.
4
Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study.ChatGPT-4与日本内科住院医师在普通内科培训考试中的表现比较:比较研究
JMIR Med Educ. 2023 Dec 6;9:e52202. doi: 10.2196/52202.
5
Influence of Model Evolution and System Roles on ChatGPT's Performance in Chinese Medical Licensing Exams: Comparative Study.模型演进和系统角色对 ChatGPT 在中文医师资格考试中表现的影响:对比研究。
JMIR Med Educ. 2024 Aug 13;10:e52784. doi: 10.2196/52784.
6
A Generative Pretrained Transformer (GPT)-Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study.基于生成式预训练转换器(GPT)的聊天机器人作为模拟患者进行病史采集的实践研究:前瞻性混合方法研究。
JMIR Med Educ. 2024 Jan 16;10:e53961. doi: 10.2196/53961.
7
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.
8
A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?GPT-3.5、GPT-4和GPT-4V之间的比较:大型语言模型(ChatGPT)能通过日本骨科手术委员会考试吗?
Cureus. 2024 Mar 18;16(3):e56402. doi: 10.7759/cureus.56402. eCollection 2024 Mar.
9
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
10
Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.ChatGPT在日本国家医师资格考试医学问题上的准确性:评估研究
JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.

引用本文的文献

1
Enhancing the Accuracy of Human Phenotype Ontology Identification: Comparative Evaluation of Multimodal Large Language Models.提高人类表型本体识别的准确性:多模态大语言模型的比较评估
J Med Internet Res. 2025 Jun 2;27:e73233. doi: 10.2196/73233.
2
Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination.评估GPT-3.5、GPT-4和GPT-4o在中国国家医师资格考试中的表现。
Sci Rep. 2025 Apr 23;15(1):14119. doi: 10.1038/s41598-025-98949-2.
3
Novel Evaluation Metric and Quantified Performance of ChatGPT-4 Patient Management Simulations for Early Clinical Education: Experimental Study.

本文引用的文献

1
Vision-Language Models for Vision Tasks: A Survey.用于视觉任务的视觉语言模型:一项综述。
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5625-5644. doi: 10.1109/TPAMI.2024.3369699. Epub 2024 Jul 2.
2
Performance of Generative Pretrained Transformer on the National Medical Licensing Examination in Japan.生成式预训练变换器在日本国家医师资格考试中的表现。
PLOS Digit Health. 2024 Jan 23;3(1):e0000433. doi: 10.1371/journal.pdig.0000433. eCollection 2024 Jan.
3
How does ChatGPT-4 preform on non-English national medical licensing examination? An evaluation in Chinese language.
用于早期临床教育的ChatGPT-4患者管理模拟的新型评估指标和量化性能:实验研究
JMIR Form Res. 2025 Feb 27;9:e66478. doi: 10.2196/66478.
4
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
5
Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study.最新大语言模型在回答牙科多项选择题方面的准确性:一项比较研究。
PLoS One. 2025 Jan 29;20(1):e0317423. doi: 10.1371/journal.pone.0317423. eCollection 2025.
6
Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型:平台开发与综合分析
J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.
7
Custom GPTs Enhancing Performance and Evidence Compared with GPT-3.5, GPT-4, and GPT-4o? A Study on the Emergency Medicine Specialist Examination.与GPT-3.5、GPT-4和GPT-4o相比,定制生成式预训练变换器(Custom GPTs)在提升性能和证据方面如何?一项关于急诊医学专科考试的研究。
Healthcare (Basel). 2024 Aug 30;12(17):1726. doi: 10.3390/healthcare12171726.
8
Potential of ChatGPT to Pass the Japanese Medical and Healthcare Professional National Licenses: A Literature Review.ChatGPT通过日本医疗及医护专业国家执照考试的可能性:文献综述
Cureus. 2024 Aug 6;16(8):e66324. doi: 10.7759/cureus.66324. eCollection 2024 Aug.
9
Reforming China's Secondary Vocational Medical Education: Adapting to the Challenges and Opportunities of the AI Era.改革中国中等职业医学教育:适应人工智能时代的挑战和机遇。
JMIR Med Educ. 2024 Aug 15;10:e48594. doi: 10.2196/48594.
10
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
ChatGPT-4在非英语国家医学执照考试中的表现如何?中文语言环境下的一项评估。
PLOS Digit Health. 2023 Dec 1;2(12):e0000397. doi: 10.1371/journal.pdig.0000397. eCollection 2023 Dec.
4
Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination.评估 GPT-3.5 和 GPT-4 在波兰医学期末考试中的表现。
Sci Rep. 2023 Nov 22;13(1):20512. doi: 10.1038/s41598-023-46995-z.
5
Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.ChatGPT在日本国家医师资格考试医学问题上的准确性:评估研究
JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.
6
Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.ChatGPT在秘鲁国家医学执照考试中的表现:横断面研究
JMIR Med Educ. 2023 Sep 28;9:e48039. doi: 10.2196/48039.
7
AI and Medical Education - A 21st-Century Pandora's Box.人工智能与医学教育——一个21世纪的潘多拉魔盒。
N Engl J Med. 2023 Aug 3;389(5):385-387. doi: 10.1056/NEJMp2304993. Epub 2023 Jul 29.
8
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.GPT-3.5和GPT-4在日本医师执照考试中的表现:比较研究。
JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.
9
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.ChatGPT在医学教育、研究与实践中的应用:对其前景与合理担忧的系统评价
Healthcare (Basel). 2023 Mar 19;11(6):887. doi: 10.3390/healthcare11060887.
10
The rise of ChatGPT: Exploring its potential in medical education.ChatGPT 的兴起:探索其在医学教育中的潜力。
Anat Sci Educ. 2024 Jul-Aug;17(5):926-931. doi: 10.1002/ase.2270. Epub 2023 Mar 28.