文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

作者信息

Takagi Soshi, Watari Takashi, Erabi Ayano, Sakaguchi Kota

机构信息

Faculty of Medicine, Shimane University, Izumo, Japan.

General Medicine Center, Shimane University Hospital, Izumo, Japan.

出版信息

JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.


DOI:10.2196/48002
PMID:37384388
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10365615/
Abstract

BACKGROUND: The competence of ChatGPT (Chat Generative Pre-Trained Transformer) in non-English languages is not well studied. OBJECTIVE: This study compared the performances of GPT-3.5 (Generative Pre-trained Transformer) and GPT-4 on the Japanese Medical Licensing Examination (JMLE) to evaluate the reliability of these models for clinical reasoning and medical knowledge in non-English languages. METHODS: This study used the default mode of ChatGPT, which is based on GPT-3.5; the GPT-4 model of ChatGPT Plus; and the 117th JMLE in 2023. A total of 254 questions were included in the final analysis, which were categorized into 3 types, namely general, clinical, and clinical sentence questions. RESULTS: The results indicated that GPT-4 outperformed GPT-3.5 in terms of accuracy, particularly for general, clinical, and clinical sentence questions. GPT-4 also performed better on difficult questions and specific disease questions. Furthermore, GPT-4 achieved the passing criteria for the JMLE, indicating its reliability for clinical reasoning and medical knowledge in non-English languages. CONCLUSIONS: GPT-4 could become a valuable tool for medical education and clinical support in non-English-speaking regions, such as Japan.

摘要

相似文献

[1]
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

JMIR Med Educ. 2023-6-29

[2]
Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.

JMIR Form Res. 2023-10-13

[3]
Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.

JMIR Med Educ. 2023-9-28

[4]
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024-7-25

[5]
Performance of Generative Pretrained Transformer on the National Medical Licensing Examination in Japan.

PLOS Digit Health. 2024-1-23

[6]
Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.

Int J Med Inform. 2023-9

[7]
Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study.

JMIR Med Educ. 2023-12-6

[8]
Advancing Medical Education: Performance of Generative Artificial Intelligence Models on Otolaryngology Board Preparation Questions With Image Analysis Insights.

Cureus. 2024-7-9

[9]
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023-2-8

[10]
Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination.

Cureus. 2023-8-3

引用本文的文献

[1]
Identification and Categorization of the Top 100 Articles and the Future of Large Language Models: Thematic Analysis Using Bibliometric Analysis.

JMIR AI. 2025-8-27

[2]
Use and Evaluation of Generative Artificial Intelligence by Medical Students in Japan.

JMA J. 2025-7-15

[3]
Dr. LLM Will See You Now: The Ability of ChatGPT to Provide Geographically Tailored Colorectal Cancer Screening and Surveillance Recommendations.

J Clin Med. 2025-7-18

[4]
The application of problem-based learning (PBL) guided by ChatGPT in clinical education in the Department of Nephrology.

BMC Med Educ. 2025-7-14

[5]
Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.

JMIR AI. 2025-5-8

[6]
Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT.

Int Urogynecol J. 2025-6-18

[7]
Can artificial intelligence generate scientific discussion that passes peer review for publication in a high-impact orthopaedic journal?

Ir J Med Sci. 2025-6-12

[8]
Performance of DeepSeek-R1 and ChatGPT-4o on the Chinese National Medical Licensing Examination: A Comparative Study.

J Med Syst. 2025-6-3

[9]
Enhancing treatment decision-making for low back pain: a novel framework integrating large language models with retrieval-augmented generation technology.

Front Med (Lausanne). 2025-5-14

[10]
Exploring the Application Capability of ChatGPT as an Instructor in Skills Education for Dental Medical Students: Randomized Controlled Trial.

J Med Internet Res. 2025-5-27

本文引用的文献

[1]
ChatGPT in healthcare: A taxonomy and systematic review.

Comput Methods Programs Biomed. 2024-3

[2]
ChatGPT Performs on the Chinese National Medical Licensing Examination.

J Med Syst. 2023-8-15

[3]
Trialling a Large Language Model (ChatGPT) in General Practice With the Applied Knowledge Test: Observational Study Demonstrating Opportunities and Limitations in Primary Care.

JMIR Med Educ. 2023-4-21

[4]
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.

N Engl J Med. 2023-3-30

[5]
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.

Healthcare (Basel). 2023-3-19

[6]
Role of Chat GPT in Public Health.

Ann Biomed Eng. 2023-5

[7]
ChatGPT: not all languages are equal.

Nature. 2023-3

[8]
Are AI language models such as ChatGPT ready to improve the care of individuals with epilepsy?

Epilepsia. 2023-5

[9]
Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study.

Int J Environ Res Public Health. 2023-2-15

[10]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023-2-9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索