评估GPT-3.5和GPT-4在2023年日本护理考试中的表现。

Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination.

作者信息

Kaneda Yudai, Takahashi Ryo, Kaneda Uiri, Akashima Shiori, Okita Haruna, Misaki Sadaya, Yamashiro Akimi, Ozaki Akihiko, Tanimoto Tetsuya

机构信息

College of Medicine, Hokkaido University, Hokkaido, JPN.

Department of Rehabilitation Medicine, Sonodakai Joint Replacement Center Hospital, Tokyo, JPN.

出版信息

Cureus. 2023 Aug 3;15(8):e42924. doi: 10.7759/cureus.42924. eCollection 2023 Aug.

DOI:10.7759/cureus.42924

PMID:37667724

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10475149/

Abstract

Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to questions from the 112th Japanese National Nursing Examination (JNNE). The study comprised three analyses: correct answer rate and score rate calculations, comparisons between GPT-3.5 and GPT-4, and comparisons of correct answer rates for conversation questions. Results ChatGPT versions 3.5 and 4 responded to 237 out of 238 Japanese questions from the 112th JNNE. While GPT-3.5 achieved an overall accuracy rate of 59.9%, failing to meet the passing standards in compulsory and general/scenario-based questions, scoring 58.0% and 58.3%, respectively, GPT-4 had an accuracy rate of 79.7%, satisfying the passing standards by scoring 90.0% and 77.7%, respectively. For each problem type, GPT-4 showed a higher accuracy rate than GPT-3.5. Specifically, the accuracy rates for compulsory questions improved from 58.0% with GPT-3.5 to 90.0% with GPT-4. For general questions, the rates went from 64.6% with GPT-3.5 to 75.6% with GPT-4. In scenario-based questions, the accuracy rates improved substantially from 51.7% with GPT-3.5 to 80.0% with GPT-4. For conversation questions, GPT-3.5 had an accuracy rate of 73.3% and GPT-4 had an accuracy rate of 93.3%. Conclusions The GPT-4 version of ChatGPT displayed performance sufficient to pass the JNNE, significantly improving from GPT-3.5. This suggests specialized medical training could make such models beneficial in Japanese clinical settings, aiding decision-making. However, user awareness and training are crucial, given potential inaccuracies in ChatGPT's responses. Hence, responsible usage with an understanding of its capabilities and limitations is vital to best support healthcare professionals and patients.

摘要

目的本研究的目的是在日本医学背景下评估大型语言模型ChatGPT的生成式预训练变换器（GPT）-3.5和GPT-4版本之间能力的变化。方法该研究让ChatGPT 3.5和4版本回答第112次日本国家护士考试（JNNE）的问题。该研究包括三项分析：正确率和得分率计算、GPT-3.5和GPT-4之间的比较以及对话问题的正确率比较。结果 ChatGPT 3.5和4版本回答了第112次JNNE的238道日语问题中的237道。虽然GPT-3.5的总体准确率为59.9%，在必答题和基于一般/情景的问题中未达到及格标准，分别得分为58.0%和58.3%，但GPT-4的准确率为79.7%，通过分别得分为90.0%和77.7%满足了及格标准。对于每种问题类型，GPT-4的准确率均高于GPT-3.5。具体而言，必答题的准确率从GPT-3.5的58.0%提高到GPT-4的90.0%。对于一般问题，得分率从GPT-3.5的64.6%提高到GPT-4的75.6%。在基于情景的问题中，准确率从GPT-3.5的51.7%大幅提高到GPT-4的80.0%。对于对话问题，GPT-3.5的准确率为73.3%，GPT-4的准确率为93.3%。结论 ChatGPT的GPT-4版本表现出足以通过JNNE的性能，与GPT-3.5相比有显著提高。这表明专门的医学训练可以使此类模型在日本临床环境中发挥作用，辅助决策。然而，鉴于ChatGPT回答可能存在不准确之处，用户意识和培训至关重要。因此，在了解其能力和局限性的情况下负责任地使用对于最佳支持医疗保健专业人员和患者至关重要。

相似文献

Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination.

Cureus. 2023 Aug 3;15(8):e42924. doi: 10.7759/cureus.42924. eCollection 2023 Aug.

Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.

JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.

Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.

JMIR Med Educ. 2023 Sep 28;9:e48039. doi: 10.2196/48039.

Artificial Intelligence in Childcare: Assessing the Performance and Acceptance of ChatGPT Responses.

Cureus. 2023 Aug 31;15(8):e44484. doi: 10.7759/cureus.44484. eCollection 2023 Aug.

Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.

Int J Med Inform. 2023 Sep;177:105173. doi: 10.1016/j.ijmedinf.2023.105173. Epub 2023 Aug 4.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?

Cureus. 2024 Mar 18;16(3):e56402. doi: 10.7759/cureus.56402. eCollection 2024 Mar.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis.

BMC Med Educ. 2024 Sep 16;24(1):1013. doi: 10.1186/s12909-024-05944-8.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

引用本文的文献

Areas of research focus and trends in the research on the application of AIGC in healthcare.

J Health Popul Nutr. 2025 Jun 14;44(1):195. doi: 10.1186/s41043-025-00947-7.

Performance evaluation of large language models for the national nursing examination in Japan.

Digit Health. 2025 May 27;11:20552076251346571. doi: 10.1177/20552076251346571. eCollection 2025 Jan-Dec.

Pilot Study on Using Large Language Models for Educational Resource Development in Japanese Radiological Technologist Exams.

Med Sci Educ. 2025 Jan 18;35(2):919-927. doi: 10.1007/s40670-024-02251-1. eCollection 2025 Apr.

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

Evaluating Chat Generative Pretrained Transformer (GPT-4o) Problem-Solving Performance in the Japan Certificate Examination for Biomedical Engineering Class 1.

Cureus. 2025 Mar 23;17(3):e81029. doi: 10.7759/cureus.81029. eCollection 2025 Mar.

ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini.

JMIR Med Educ. 2025 Mar 5;11:e65108. doi: 10.2196/65108.

ChatGPT (GPT-4V) Performance on the Healthcare Information Technologist Examination in Japan.

Cureus. 2025 Jan 1;17(1):e76775. doi: 10.7759/cureus.76775. eCollection 2025 Jan.

Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study.

JMIR Med Educ. 2025 Jan 17;11:e56850. doi: 10.2196/56850.

Analyzing evaluation methods for large language models in the medical field: a scoping review.

BMC Med Inform Decis Mak. 2024 Nov 29;24(1):366. doi: 10.1186/s12911-024-02709-7.

Analysis of Responses of GPT-4 V to the Japanese National Clinical Engineer Licensing Examination.

J Med Syst. 2024 Sep 11;48(1):83. doi: 10.1007/s10916-024-02103-w.

本文引用的文献

ChatGPT Performs on the Chinese National Medical Licensing Examination.

J Med Syst. 2023 Aug 15;47(1):86. doi: 10.1007/s10916-023-01961-0.

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.

In the era of prominent AI, what role will physicians be expected to play?

QJM. 2023 Oct 23;116(10):881. doi: 10.1093/qjmed/hcad099.

Are the issues pointed out by ChatGPT can be applied to Japan? - Examining the reasons behind high COVID-19 excess deaths in Japan.

New Microbes New Infect. 2023 Jun;53:101116. doi: 10.1016/j.nmni.2023.101116. Epub 2023 Mar 29.

Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.

N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.

AI did not write this manuscript, or did it? Can we trick the AI text detector into generated texts? The potential future of ChatGPT and AI in Sports & Exercise Medicine manuscript generation.

BMJ Open Sport Exerc Med. 2023 Feb 16;9(1):e001568. doi: 10.1136/bmjsem-2023-001568. eCollection 2023.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Burnout in Healthcare Workers: Prevalence, Impact and Preventative Strategies.

Local Reg Anesth. 2020 Oct 28;13:171-183. doi: 10.2147/LRA.S240564. eCollection 2020.

A deep learning system for differential diagnosis of skin diseases.

Nat Med. 2020 Jun;26(6):900-908. doi: 10.1038/s41591-020-0842-3. Epub 2020 May 18.

Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer.

NPJ Digit Med. 2019 Jun 7;2:48. doi: 10.1038/s41746-019-0112-2. eCollection 2019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估GPT-3.5和GPT-4在2023年日本护理考试中的表现。

Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination.

作者信息

Kaneda Yudai, Takahashi Ryo, Kaneda Uiri, Akashima Shiori, Okita Haruna, Misaki Sadaya, Yamashiro Akimi, Ozaki Akihiko, Tanimoto Tetsuya

机构信息

College of Medicine, Hokkaido University, Hokkaido, JPN.

Department of Rehabilitation Medicine, Sonodakai Joint Replacement Center Hospital, Tokyo, JPN.

出版信息

Cureus. 2023 Aug 3;15(8):e42924. doi: 10.7759/cureus.42924. eCollection 2023 Aug.

DOI:10.7759/cureus.42924

PMID:37667724

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10475149/

Abstract

摘要

评估GPT-3.5和GPT-4在2023年日本护理考试中的表现。

Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

评估GPT-3.5和GPT-4在2023年日本护理考试中的表现。

Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination.

作者信息

机构信息

出版信息