• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT 在中美护理执照考试中的表现:横断面研究。

Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study.

机构信息

Department of Bone and Joint Surgery and Sports Medicine Center, The First Affiliated Hospital, Guangzhou, China.

Department of Joint Surgery and Sports Medicine, Zhuhai People's Hospital, Zhuhai City, China.

出版信息

JMIR Med Educ. 2024 Oct 3;10:e52746. doi: 10.2196/52746.

DOI:10.2196/52746
PMID:39363539
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11466054/
Abstract

BACKGROUND

The creation of large language models (LLMs) such as ChatGPT is an important step in the development of artificial intelligence, which shows great potential in medical education due to its powerful language understanding and generative capabilities. The purpose of this study was to quantitatively evaluate and comprehensively analyze ChatGPT's performance in handling questions for the National Nursing Licensure Examination (NNLE) in China and the United States, including the National Council Licensure Examination for Registered Nurses (NCLEX-RN) and the NNLE.

OBJECTIVE

This study aims to examine how well LLMs respond to the NCLEX-RN and the NNLE multiple-choice questions (MCQs) in various language inputs. To evaluate whether LLMs can be used as multilingual learning assistance for nursing, and to assess whether they possess a repository of professional knowledge applicable to clinical nursing practice.

METHODS

First, we compiled 150 NCLEX-RN Practical MCQs, 240 NNLE Theoretical MCQs, and 240 NNLE Practical MCQs. Then, the translation function of ChatGPT 3.5 was used to translate NCLEX-RN questions from English to Chinese and NNLE questions from Chinese to English. Finally, the original version and the translated version of the MCQs were inputted into ChatGPT 4.0, ChatGPT 3.5, and Google Bard. Different LLMs were compared according to the accuracy rate, and the differences between different language inputs were compared.

RESULTS

The accuracy rates of ChatGPT 4.0 for NCLEX-RN practical questions and Chinese-translated NCLEX-RN practical questions were 88.7% (133/150) and 79.3% (119/150), respectively. Despite the statistical significance of the difference (P=.03), the correct rate was generally satisfactory. Around 71.9% (169/235) of NNLE Theoretical MCQs and 69.1% (161/233) of NNLE Practical MCQs were correctly answered by ChatGPT 4.0. The accuracy of ChatGPT 4.0 in processing NNLE Theoretical MCQs and NNLE Practical MCQs translated into English was 71.5% (168/235; P=.92) and 67.8% (158/233; P=.77), respectively, and there was no statistically significant difference between the results of text input in different languages. ChatGPT 3.5 (NCLEX-RN P=.003, NNLE Theoretical P<.001, NNLE Practical P=.12) and Google Bard (NCLEX-RN P<.001, NNLE Theoretical P<.001, NNLE Practical P<.001) had lower accuracy rates for nursing-related MCQs than ChatGPT 4.0 in English input. English accuracy was higher when compared with ChatGPT 3.5's Chinese input, and the difference was statistically significant (NCLEX-RN P=.02, NNLE Practical P=.02). Whether submitted in Chinese or English, the MCQs from the NCLEX-RN and NNLE demonstrated that ChatGPT 4.0 had the highest number of unique correct responses and the lowest number of unique incorrect responses among the 3 LLMs.

CONCLUSIONS

This study, focusing on 618 nursing MCQs including NCLEX-RN and NNLE exams, found that ChatGPT 4.0 outperformed ChatGPT 3.5 and Google Bard in accuracy. It excelled in processing English and Chinese inputs, underscoring its potential as a valuable tool in nursing education and clinical decision-making.

摘要

背景

ChatGPT 等大型语言模型的创建是人工智能发展的重要一步,由于其强大的语言理解和生成能力,在医学教育中具有很大的应用潜力。本研究的目的是定量评估和全面分析 ChatGPT 在处理中美两国护士执照考试(NNLE)问题的能力,包括美国全国注册护士执照考试(NCLEX-RN)和 NNLE。

目的

本研究旨在考察大型语言模型在处理各种语言输入的 NCLEX-RN 和 NNLE 多选题(MCQ)方面的表现。评估大型语言模型是否可以作为护理的多语言学习辅助工具,以及评估它们是否具有适用于临床护理实践的专业知识库。

方法

首先,我们编译了 150 个 NCLEX-RN 实践 MCQ、240 个 NNLE 理论 MCQ 和 240 个 NNLE 实践 MCQ。然后,使用 ChatGPT 3.5 的翻译功能将 NCLEX-RN 问题从英语翻译成中文,将 NNLE 问题从中文翻译成英语。最后,将 MCQ 的原始版本和翻译版本输入到 ChatGPT 4.0、ChatGPT 3.5 和 Google Bard 中。根据准确率比较不同的大型语言模型,并比较不同语言输入之间的差异。

结果

ChatGPT 4.0 对 NCLEX-RN 实践问题和中文翻译的 NCLEX-RN 实践问题的准确率分别为 88.7%(133/150)和 79.3%(119/150)。尽管存在统计学意义上的差异(P=.03),但准确率总体上令人满意。约 71.9%(169/235)的 NNLE 理论 MCQ 和 69.1%(161/233)的 NNLE 实践 MCQ 被 ChatGPT 4.0 正确回答。ChatGPT 4.0 处理翻译成英语的 NNLE 理论 MCQ 和 NNLE 实践 MCQ 的准确率分别为 71.5%(168/235;P=.92)和 67.8%(158/233;P=.77),两种语言输入的结果之间没有统计学意义上的差异。ChatGPT 3.5(NCLEX-RN P=.003,NNLE 理论 P<.001,NNLE 实践 P=.12)和 Google Bard(NCLEX-RN P<.001,NNLE 理论 P<.001,NNLE 实践 P<.001)在英语输入时对护理相关 MCQ 的准确率低于 ChatGPT 4.0。与 ChatGPT 3.5 的中文输入相比,英语输入的准确率更高,差异具有统计学意义(NCLEX-RN P=.02,NNLE 实践 P=.02)。无论是以中文还是英文提交,NCLEX-RN 和 NNLE 的 MCQ 都表明,在 3 种大型语言模型中,ChatGPT 4.0 具有最高的独特正确答案数量和最低的独特错误答案数量。

结论

本研究聚焦于包括 NCLEX-RN 和 NNLE 考试在内的 618 个护理 MCQ,发现 ChatGPT 4.0 在准确性方面优于 ChatGPT 3.5 和 Google Bard。它在处理英语和中文输入方面表现出色,这突显了它在护理教育和临床决策中的潜在价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25af/11466054/9417598facff/mededu-v10-e52746-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25af/11466054/cbe70174074a/mededu-v10-e52746-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25af/11466054/8fc02b61a40a/mededu-v10-e52746-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25af/11466054/e0287502ec6e/mededu-v10-e52746-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25af/11466054/9417598facff/mededu-v10-e52746-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25af/11466054/cbe70174074a/mededu-v10-e52746-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25af/11466054/8fc02b61a40a/mededu-v10-e52746-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25af/11466054/e0287502ec6e/mededu-v10-e52746-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25af/11466054/9417598facff/mededu-v10-e52746-g004.jpg

相似文献

1
Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study.ChatGPT 在中美护理执照考试中的表现:横断面研究。
JMIR Med Educ. 2024 Oct 3;10:e52746. doi: 10.2196/52746.
2
Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses.ChatGPT 在国家医师、药师、护士等医学类考试中的表现:一项针对医、药、护人员的五年考试评估研究。
BMC Med Educ. 2024 Feb 14;24(1):143. doi: 10.1186/s12909-024-05125-7.
3
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
4
Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study.Qwen-2.5在中国国家护士执业资格考试中表现优于其他大语言模型:回顾性横断面比较研究。
JMIR Med Inform. 2025 Jan 10;13:e63731. doi: 10.2196/63731.
5
Identifying Indicators of National Council Licensure Examination for Registered Nurses (NCLEX-RN) Success in Nursing Graduates in Newfoundland & Labrador.确定纽芬兰与拉布拉多省护理专业毕业生通过美国国家执业护士执照考试(NCLEX-RN)的指标。
Int J Nurs Educ Scholarsh. 2019 Aug 6;16(1):ijnes-2018-0060. doi: 10.1515/ijnes-2018-0060.
6
Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial.将 ChatGPT 融入骨科医学本科生教育:随机对照试验。
J Med Internet Res. 2024 Aug 20;26:e57037. doi: 10.2196/57037.
7
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响:来自台湾护理执照考试的见解。
Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.
8
Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis.在大体解剖学课程中使用大语言模型(ChatGPT、Copilot、PaLM、Bard和Gemini):比较分析
Clin Anat. 2025 Mar;38(2):200-210. doi: 10.1002/ca.24244. Epub 2024 Nov 21.
9
NCLEX-RN preparation resources available online in French: An integrative review.在线提供的法语版 NCLEX-RN 备考资源:综合评价。
Int Nurs Rev. 2022 Jun;69(2):211-220. doi: 10.1111/inr.12705. Epub 2021 Aug 5.
10
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

引用本文的文献

1
Large Language Models in Nursing Education: Concept Analysis.护理教育中的大语言模型:概念分析
JMIR Nurs. 2025 Aug 22;8:e77948. doi: 10.2196/77948.
2
Performance evaluation of large language models for the national nursing examination in Japan.日本国家护士考试中大型语言模型的性能评估
Digit Health. 2025 May 27;11:20552076251346571. doi: 10.1177/20552076251346571. eCollection 2025 Jan-Dec.

本文引用的文献

1
Electronic Medical Record Data Missingness and Interruption in Antiretroviral Therapy Among Adults and Children Living With HIV in Haiti: Retrospective Longitudinal Study.海地艾滋病毒感染成人和儿童的电子病历数据缺失与抗逆转录病毒治疗中断:回顾性纵向研究
JMIR Pediatr Parent. 2024 Mar 6;7:e51574. doi: 10.2196/51574.
2
Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses.ChatGPT 在国家医师、药师、护士等医学类考试中的表现:一项针对医、药、护人员的五年考试评估研究。
BMC Med Educ. 2024 Feb 14;24(1):143. doi: 10.1186/s12909-024-05125-7.
3
Beyond human in neurosurgical exams: ChatGPT's success in the Turkish neurosurgical society proficiency board exams.
神经外科考试中的超越人类:ChatGPT 在土耳其神经外科学会专业能力考试中的成功。
Comput Biol Med. 2024 Feb;169:107807. doi: 10.1016/j.compbiomed.2023.107807. Epub 2023 Dec 10.
4
Evaluation of Large language model performance on the Multi-Specialty Recruitment Assessment (MSRA) exam.大语言模型在多专科招聘评估(MSRA)考试中的表现评估。
Comput Biol Med. 2024 Jan;168:107794. doi: 10.1016/j.compbiomed.2023.107794. Epub 2023 Nov 30.
5
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
6
Accuracy of ChatGPT, Google Bard, and Microsoft Bing for Simplifying Radiology Reports.ChatGPT、谷歌巴德和微软必应简化放射学报告的准确性。
Radiology. 2023 Nov;309(2):e232561. doi: 10.1148/radiol.232561.
7
ChatGPT's Ability to Assist with Clinical Documentation: A Randomized Controlled Trial.ChatGPT 在临床文档中的辅助能力:一项随机对照试验。
J Am Acad Orthop Surg. 2024 Feb 1;32(3):123-129. doi: 10.5435/JAAOS-D-23-00474. Epub 2023 Nov 17.
8
Leveraging Large Language Models for Decision Support in Personalized Oncology.利用大型语言模型为个性化肿瘤学提供决策支持。
JAMA Netw Open. 2023 Nov 1;6(11):e2343689. doi: 10.1001/jamanetworkopen.2023.43689.
9
Can novel multimodal chatbots such as Bing Chat Enterprise, ChatGPT-4 Pro, and Google Bard correctly interpret electrocardiogram images?像必应聊天企业版、ChatGPT-4 Pro和谷歌巴德这样的新型多模态聊天机器人能否正确解读心电图图像?
Resuscitation. 2023 Dec;193:110009. doi: 10.1016/j.resuscitation.2023.110009. Epub 2023 Oct 24.
10
A holistic approach to remote patient monitoring, fueled by ChatGPT and Metaverse technology: The future of nursing education.基于 ChatGPT 和元宇宙技术的远程患者监测整体方法:护理教育的未来。
Nurse Educ Today. 2023 Dec;131:105972. doi: 10.1016/j.nedt.2023.105972. Epub 2023 Sep 12.