• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估GPT-3.5和GPT-4在2023年日本护理考试中的表现。

Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination.

作者信息

Kaneda Yudai, Takahashi Ryo, Kaneda Uiri, Akashima Shiori, Okita Haruna, Misaki Sadaya, Yamashiro Akimi, Ozaki Akihiko, Tanimoto Tetsuya

机构信息

College of Medicine, Hokkaido University, Hokkaido, JPN.

Department of Rehabilitation Medicine, Sonodakai Joint Replacement Center Hospital, Tokyo, JPN.

出版信息

Cureus. 2023 Aug 3;15(8):e42924. doi: 10.7759/cureus.42924. eCollection 2023 Aug.

DOI:10.7759/cureus.42924
PMID:37667724
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10475149/
Abstract

Purpose The purpose of this study was to evaluate the changes in capabilities between the Generative Pre-trained Transformer (GPT)-3.5 and GPT-4 versions of the large-scale language model ChatGPT within a Japanese medical context. Methods The study involved ChatGPT versions 3.5 and 4 responding to questions from the 112th Japanese National Nursing Examination (JNNE). The study comprised three analyses: correct answer rate and score rate calculations, comparisons between GPT-3.5 and GPT-4, and comparisons of correct answer rates for conversation questions. Results ChatGPT versions 3.5 and 4 responded to 237 out of 238 Japanese questions from the 112th JNNE. While GPT-3.5 achieved an overall accuracy rate of 59.9%, failing to meet the passing standards in compulsory and general/scenario-based questions, scoring 58.0% and 58.3%, respectively, GPT-4 had an accuracy rate of 79.7%, satisfying the passing standards by scoring 90.0% and 77.7%, respectively. For each problem type, GPT-4 showed a higher accuracy rate than GPT-3.5. Specifically, the accuracy rates for compulsory questions improved from 58.0% with GPT-3.5 to 90.0% with GPT-4. For general questions, the rates went from 64.6% with GPT-3.5 to 75.6% with GPT-4. In scenario-based questions, the accuracy rates improved substantially from 51.7% with GPT-3.5 to 80.0% with GPT-4. For conversation questions, GPT-3.5 had an accuracy rate of 73.3% and GPT-4 had an accuracy rate of 93.3%. Conclusions The GPT-4 version of ChatGPT displayed performance sufficient to pass the JNNE, significantly improving from GPT-3.5. This suggests specialized medical training could make such models beneficial in Japanese clinical settings, aiding decision-making. However, user awareness and training are crucial, given potential inaccuracies in ChatGPT's responses. Hence, responsible usage with an understanding of its capabilities and limitations is vital to best support healthcare professionals and patients.

摘要

目的 本研究的目的是在日本医学背景下评估大型语言模型ChatGPT的生成式预训练变换器(GPT)-3.5和GPT-4版本之间能力的变化。方法 该研究让ChatGPT 3.5和4版本回答第112次日本国家护士考试(JNNE)的问题。该研究包括三项分析:正确率和得分率计算、GPT-3.5和GPT-4之间的比较以及对话问题的正确率比较。结果 ChatGPT 3.5和4版本回答了第112次JNNE的238道日语问题中的237道。虽然GPT-3.5的总体准确率为59.9%,在必答题和基于一般/情景的问题中未达到及格标准,分别得分为58.0%和58.3%,但GPT-4的准确率为79.7%,通过分别得分为90.0%和77.7%满足了及格标准。对于每种问题类型,GPT-4的准确率均高于GPT-3.5。具体而言,必答题的准确率从GPT-3.5的58.0%提高到GPT-4的90.0%。对于一般问题,得分率从GPT-3.5的64.6%提高到GPT-4的75.6%。在基于情景的问题中,准确率从GPT-3.5的51.7%大幅提高到GPT-4的80.0%。对于对话问题,GPT-3.5的准确率为73.3%,GPT-4的准确率为93.3%。结论 ChatGPT的GPT-4版本表现出足以通过JNNE的性能,与GPT-3.5相比有显著提高。这表明专门的医学训练可以使此类模型在日本临床环境中发挥作用,辅助决策。然而,鉴于ChatGPT回答可能存在不准确之处,用户意识和培训至关重要。因此,在了解其能力和局限性的情况下负责任地使用对于最佳支持医疗保健专业人员和患者至关重要。

相似文献

1
Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination.评估GPT-3.5和GPT-4在2023年日本护理考试中的表现。
Cureus. 2023 Aug 3;15(8):e42924. doi: 10.7759/cureus.42924. eCollection 2023 Aug.
2
Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.ChatGPT在日本国家医师资格考试医学问题上的准确性:评估研究
JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.
3
Performance of ChatGPT on the Peruvian National Licensing Medical Examination: Cross-Sectional Study.ChatGPT在秘鲁国家医学执照考试中的表现:横断面研究
JMIR Med Educ. 2023 Sep 28;9:e48039. doi: 10.2196/48039.
4
Artificial Intelligence in Childcare: Assessing the Performance and Acceptance of ChatGPT Responses.人工智能在儿童保育中的应用:评估ChatGPT回复的性能与可接受性
Cureus. 2023 Aug 31;15(8):e44484. doi: 10.7759/cureus.44484. eCollection 2023 Aug.
5
Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.ChatGPT 在中文体检、病历和教育方面的表现和探索:为医疗 AI 铺平道路。
Int J Med Inform. 2023 Sep;177:105173. doi: 10.1016/j.ijmedinf.2023.105173. Epub 2023 Aug 4.
6
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
7
A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?GPT-3.5、GPT-4和GPT-4V之间的比较:大型语言模型(ChatGPT)能通过日本骨科手术委员会考试吗?
Cureus. 2024 Mar 18;16(3):e56402. doi: 10.7759/cureus.56402. eCollection 2024 Mar.
8
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
9
Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis.ChatGPT-3.5 和 GPT-4 在医学、药学、牙科和护理国家执照考试中的表现:系统评价和荟萃分析。
BMC Med Educ. 2024 Sep 16;24(1):1013. doi: 10.1186/s12909-024-05944-8.
10
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现:观察性研究。
JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

引用本文的文献

1
Areas of research focus and trends in the research on the application of AIGC in healthcare.人工智能生成内容(AIGC)在医疗保健领域应用的研究重点领域和研究趋势。
J Health Popul Nutr. 2025 Jun 14;44(1):195. doi: 10.1186/s41043-025-00947-7.
2
Performance evaluation of large language models for the national nursing examination in Japan.日本国家护士考试中大型语言模型的性能评估
Digit Health. 2025 May 27;11:20552076251346571. doi: 10.1177/20552076251346571. eCollection 2025 Jan-Dec.
3
Pilot Study on Using Large Language Models for Educational Resource Development in Japanese Radiological Technologist Exams.利用大语言模型进行日本放射技师考试教育资源开发的初步研究。
Med Sci Educ. 2025 Jan 18;35(2):919-927. doi: 10.1007/s40670-024-02251-1. eCollection 2025 Apr.
4
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
5
Evaluating Chat Generative Pretrained Transformer (GPT-4o) Problem-Solving Performance in the Japan Certificate Examination for Biomedical Engineering Class 1.评估聊天生成预训练变换器(GPT-4o)在日本生物医学工程1级证书考试中的问题解决表现。
Cureus. 2025 Mar 23;17(3):e81029. doi: 10.7759/cureus.81029. eCollection 2025 Mar.
6
ChatGPT's Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini.ChatGPT在葡萄牙语医学考试问题上的表现:ChatGPT-3.5 Turbo与ChatGPT-4o Mini的比较分析。
JMIR Med Educ. 2025 Mar 5;11:e65108. doi: 10.2196/65108.
7
ChatGPT (GPT-4V) Performance on the Healthcare Information Technologist Examination in Japan.ChatGPT(GPT - 4V)在日本医疗信息技术专家考试中的表现。
Cureus. 2025 Jan 1;17(1):e76775. doi: 10.7759/cureus.76775. eCollection 2025 Jan.
8
Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study.ChatGPT-3.5和ChatGPT-4在台湾国家药剂师执照考试中的表现:比较评估研究。
JMIR Med Educ. 2025 Jan 17;11:e56850. doi: 10.2196/56850.
9
Analyzing evaluation methods for large language models in the medical field: a scoping review.分析医学领域大语言模型的评价方法:范围综述。
BMC Med Inform Decis Mak. 2024 Nov 29;24(1):366. doi: 10.1186/s12911-024-02709-7.
10
Analysis of Responses of GPT-4 V to the Japanese National Clinical Engineer Licensing Examination.GPT-4V 对日本全国临床工程师执照考试的反应分析。
J Med Syst. 2024 Sep 11;48(1):83. doi: 10.1007/s10916-024-02103-w.

本文引用的文献

1
ChatGPT Performs on the Chinese National Medical Licensing Examination.ChatGPT 通过中国医师资格考试。
J Med Syst. 2023 Aug 15;47(1):86. doi: 10.1007/s10916-023-01961-0.
2
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.GPT-3.5和GPT-4在日本医师执照考试中的表现:比较研究。
JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.
3
In the era of prominent AI, what role will physicians be expected to play?在人工智能显著发展的时代,人们期望医生扮演什么样的角色?
QJM. 2023 Oct 23;116(10):881. doi: 10.1093/qjmed/hcad099.
4
Are the issues pointed out by ChatGPT can be applied to Japan? - Examining the reasons behind high COVID-19 excess deaths in Japan.ChatGPT指出的问题在日本也适用吗?——探究日本新冠超额死亡人数居高不下的背后原因。
New Microbes New Infect. 2023 Jun;53:101116. doi: 10.1016/j.nmni.2023.101116. Epub 2023 Mar 29.
5
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.GPT-4作为医学人工智能聊天机器人的益处、局限性和风险
N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.
6
AI did not write this manuscript, or did it? Can we trick the AI text detector into generated texts? The potential future of ChatGPT and AI in Sports & Exercise Medicine manuscript generation.这篇手稿不是人工智能写的,或者是吗?我们能骗过人工智能文本检测器来生成文本吗?ChatGPT和人工智能在运动与运动医学手稿生成方面的潜在未来。
BMJ Open Sport Exerc Med. 2023 Feb 16;9(1):e001568. doi: 10.1136/bmjsem-2023-001568. eCollection 2023.
7
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
8
Burnout in Healthcare Workers: Prevalence, Impact and Preventative Strategies.医护人员职业倦怠:患病率、影响及预防策略
Local Reg Anesth. 2020 Oct 28;13:171-183. doi: 10.2147/LRA.S240564. eCollection 2020.
9
A deep learning system for differential diagnosis of skin diseases.深度学习系统用于皮肤病的鉴别诊断。
Nat Med. 2020 Jun;26(6):900-908. doi: 10.1038/s41591-020-0842-3. Epub 2020 May 18.
10
Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer.一种用于改善前列腺癌Gleason评分的深度学习算法的开发与验证
NPJ Digit Med. 2019 Jun 7;2:48. doi: 10.1038/s41746-019-0112-2. eCollection 2019.