Suppr超能文献

评估问题特征对 ChatGPT 表现和回应解释一致性的影响:来自台湾护理执照考试的见解。

Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.

机构信息

Department of Nursing, Taipei Veterans General Hospital, Taipei, Taiwan.

Big Data Center, Taipei Veterans General Hospital, Taipei, Taiwan.

出版信息

Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.

Abstract

BACKGROUND

Investigates the integration of an artificial intelligence tool, specifically ChatGPT, in nursing education, addressing its effectiveness in exam preparation and self-assessment.

OBJECTIVE

This study aims to evaluate the performance of ChatGPT, one of the most promising artificial intelligence-driven linguistic understanding tools in answering question banks for nursing licensing examination preparation. It further analyzes question characteristics that might impact the accuracy of ChatGPT-generated answers and examines its reliability through human expert reviews.

DESIGN

Cross-sectional survey comparing ChatGPT-generated answers and their explanations.

SETTING

400 questions from Taiwan's 2022 Nursing Licensing Exam.

METHODS

The study analyzed 400 questions from five distinct subjects of Taiwan's 2022 Nursing Licensing Exam using the ChatGPT model which provided answers and in-depth explanations for each question. The impact of various question characteristics, such as type and cognitive level, on the accuracy of the ChatGPT-generated responses was assessed using logistic regression analysis. Additionally, human experts evaluated the explanations for each question, comparing them with the ChatGPT-generated answers to determine consistency.

RESULTS

ChatGPT exhibited overall accuracy at 80.75 % for Taiwan's National Nursing Exam, which passes the exam. The accuracy of ChatGPT-generated answers diverged significantly across test subjects, demonstrating a hierarchy ranging from General Medicine at 88.75 %, Medical-Surgical Nursing at 80.0 %, Psychology and Community Nursing at 70.0 %, Obstetrics and Gynecology Nursing at 67.5 %, down to Basic Nursing at 63.0 %. ChatGPT had a higher probability of eliciting incorrect responses for questions with certain characteristics, notably those with clinical vignettes [odds ratio 2.19, 95 % confidence interval 1.24-3.87, P = 0.007] and complex multiple-choice questions [odds ratio 2.37, 95 % confidence interval 1.00-5.60, P = 0.049]. Furthermore, 14.25 % of ChatGPT-generated answers were inconsistent with their explanations, leading to a reduction in the overall accuracy to 74 %.

CONCLUSIONS

This study reveals the ChatGPT's capabilities and limitations in nursing exam preparation, underscoring its potential as an auxiliary educational tool. It highlights the model's varied performance across different question types and notable inconsistencies between its answers and explanations. The study contributes significantly to the understanding of artificial intelligence in learning environments, guiding the future development of more effective and reliable artificial intelligence-based educational technologies.

TWEETABLE ABSTRACT

New study reveals ChatGPT's potential and challenges in nursing education: Achieves 80.75 % accuracy in exam prep but faces hurdles with complex questions and logical consistency. #AIinNursing #AIinEducation #NursingExams #ChatGPT.

摘要

背景

研究将人工智能工具 ChatGPT 整合到护理教育中,探讨其在考试准备和自我评估方面的效果。

目的

本研究旨在评估 ChatGPT 在回答护理执照考试准备题库中的表现,ChatGPT 是最有前途的人工智能驱动语言理解工具之一。此外,还分析了可能影响 ChatGPT 生成答案准确性的问题特征,并通过人类专家审查来检验其可靠性。

设计

比较 ChatGPT 生成答案及其解释的横断面调查。

设置

来自台湾 2022 年护理执照考试的 400 个问题。

方法

该研究使用 ChatGPT 模型分析了来自台湾 2022 年护理执照考试五个不同科目的 400 个问题,该模型为每个问题提供了答案和深入的解释。使用逻辑回归分析评估了各种问题特征(如类型和认知水平)对 ChatGPT 生成的响应准确性的影响。此外,人类专家对每个问题的解释进行了评估,将其与 ChatGPT 生成的答案进行比较,以确定一致性。

结果

ChatGPT 在台湾全国护理考试中的总体准确率为 80.75%,达到考试及格水平。ChatGPT 生成答案的准确性在不同科目之间存在显著差异,显示出从普通医学 88.75%、内科护理学 80.0%、心理学和社区护理学 70.0%、妇产科护理学 67.5%,到基础护理学 63.0%的层次。ChatGPT 更有可能对具有某些特征的问题产生错误的答案,尤其是那些有临床案例的问题[比值比 2.19,95%置信区间 1.24-3.87,P=0.007]和复杂的多项选择题[比值比 2.37,95%置信区间 1.00-5.60,P=0.049]。此外,ChatGPT 生成的答案有 14.25%与其解释不一致,这导致整体准确率降至 74%。

结论

本研究揭示了 ChatGPT 在护理考试准备中的能力和局限性,强调了它作为辅助教育工具的潜力。它突出了模型在不同问题类型上的表现差异,以及其答案和解释之间的显著不一致。该研究为人工智能在学习环境中的应用提供了重要的见解,指导了更有效和可靠的基于人工智能的教育技术的未来发展。

推文摘要

新研究揭示了 ChatGPT 在护理教育中的潜力和挑战:在考试准备中达到 80.75%的准确率,但在复杂问题和逻辑一致性方面面临困难。#护理教育中的 AI #教育中的 AI #护理考试 #ChatGPT。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验