Zhao Quantong, Wang Haiyan, Wang Ran, Cao Hongshi
Department of Nursing, Lequn Branch, The First Hospital of Jilin University, Changchun, Jilin, China; School of Nursing, Jilin University, Changchun, Jilin, China.
Department of Nursing, Lequn Branch, The First Hospital of Jilin University, Changchun, Jilin, China.
Nurse Educ Pract. 2025 Mar;84:104284. doi: 10.1016/j.nepr.2025.104284. Epub 2025 Feb 4.
This study aims to build a Custom GPT specifically designed to answer questions from the Chinese Nursing Licensing Exam, to examine its accuracy and response quality.
Custom GPT could be an efficient tool in nursing education, but it has not yet been implemented in this field.
A quantitative, descriptive, cross-sectional approach was used to evaluate the performance of a Custom GPT. In this study, we developed a Custom GPT by integrating customized knowledge and using Prompt Engineering, retrieval-augmented generation and semantic search technology. Our Custom GPT's performance was compared with that of standard ChatGPT-4 by analyzing 720 questions from three mock exams for the 2024 Chinese Nursing Licensing Exam.
Custom GPT provided superior results, with its accuracy consistently exceeding 90 % across all six parts of the exams, whereas the accuracy of ChatGPT-4 ranged from 73 % to 89 %. Furthermore, the performance of Custom GPT (accuracy, >85 %) across different question types was superior to that of ChatGPT-4 (accuracy, 66-83 %). The odds ratios consistently favored Custom GPT, indicating a significantly higher likelihood of correct responses (P < 0.05 for most comparisons). In generating explanations, Custom GPT tended to provided more concise and confident responses, whereas ChatGPT-4 provided longer, speculative responses with higher chances of inaccuracies and hallucinations.
This study demonstrated significant advantages of Custom GPT over ChatGPT in the Chinese Nursing Licensing Exam, indicating its immense potential in specific application scenarios and its potential for expansion to other areas of nursing.
本研究旨在构建一个专门用于回答中国护士执业资格考试问题的定制生成式预训练变换器(Custom GPT),并检验其准确性和回答质量。
定制生成式预训练变换器在护理教育中可能是一种有效的工具,但尚未在该领域得到应用。
采用定量、描述性横断面研究方法评估定制生成式预训练变换器的性能。在本研究中,我们通过整合定制知识并使用提示工程、检索增强生成和语义搜索技术,开发了一个定制生成式预训练变换器。通过分析2024年中国护士执业资格考试三次模拟考试中的720道问题,将我们的定制生成式预训练变换器的性能与标准的ChatGPT-4进行了比较。
定制生成式预训练变换器提供了更优的结果,在考试的所有六个部分中,其准确率始终超过90%,而ChatGPT-4的准确率在73%至89%之间。此外,定制生成式预训练变换器在不同题型上的表现(准确率>85%)优于ChatGPT-4(准确率66%-83%)。优势比始终有利于定制生成式预训练变换器,表明正确回答的可能性显著更高(大多数比较中P<0.05)。在生成解释时,定制生成式预训练变换器倾向于提供更简洁、自信的回答,而ChatGPT-4提供的回答更长、更具推测性,不准确和产生幻觉的可能性更高。
本研究证明了定制生成式预训练变换器在中国护士执业资格考试中相对于ChatGPT的显著优势,表明其在特定应用场景中的巨大潜力以及扩展到护理其他领域的潜力。