Ng Jamie Qiao Xin, Chua Joelle Yan Xin, Choolani Mahesh, Li Sarah W L, Foo Lin, Pereira Travis Lanz-Brian, Shorey Shefaly
Alice Lee Centre for Nursing Studies, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
Department of Obstetrics and Gynaecology, National University Hospital, Singapore; Department of Obstetrics and Gynaecology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Obstetrics and Gynaecology, National University Centre for Women and Children (NUWoC), National University Health System, Singapore.
Nurse Educ Pract. 2025 Aug;87:104488. doi: 10.1016/j.nepr.2025.104488. Epub 2025 Jul 25.
This study aimed to evaluate the performance of publicly available large language models (LLMs), ChatGPT-4o, ChatGPT-4o Mini and Perplexity AI, in responding to research-related questions at the undergraduate nursing level. The evaluation was conducted across different platforms and prompt structures. The research questions were categorized according to Bloom's taxonomy, to compare the quality of AI-generated responses across cognitive levels. Additionally, the study explored the perspectives of research members on using AI tools in teaching foundational research concepts to undergraduate nursing students.
Large Language Models (LLMs) could help nursing students learn foundational research concepts but their performance in answering research-related questions has not been explored.
An exploratory case study was conducted to evaluate the performance of ChatGPT-4o, ChatGPT-4o Mini and Perplexity AI in answering 41 research-related questions.
Three different prompts (Prompt-1: Unstructured with no context; Prompt-2: Structured from professor's perspective; Prompt-3: Structured from student's perspective) were tested. A 5-point Likert-type valid author-developed scale was used to assess all AI-generated responses across six domains: Accuracy, Relevance, Clarity & Structure, Examples Provided, Critical Thinking and Referencing.
All three AI models generated higher-quality responses when structured prompts were used compared with unstructured prompts and responded well across the different Bloom's taxonomy levels. ChatGPT-4o and ChatGPT-4o Mini performed better at answering research-related questions than Perplexity AI.
AI models hold promise as supplementary tools for enhancing undergraduate nursing students' understanding of foundational research concepts. Further studies are warranted to evaluate their impact on specific research-related learning outcomes within nursing education.
本研究旨在评估公开可用的大语言模型(LLMs),即ChatGPT-4o、ChatGPT-4o Mini和Perplexity AI,在回答本科护理水平的研究相关问题时的表现。评估在不同平台和提示结构上进行。研究问题根据布鲁姆分类法进行分类,以比较人工智能生成的回答在不同认知水平上的质量。此外,该研究还探讨了研究人员对使用人工智能工具向本科护理学生传授基础研究概念的看法。
大语言模型(LLMs)可以帮助护理专业学生学习基础研究概念,但尚未探索它们在回答研究相关问题方面的表现。
进行了一项探索性案例研究,以评估ChatGPT-4o、ChatGPT-4o Mini和Perplexity AI在回答41个研究相关问题时的表现。
测试了三种不同的提示(提示1:无上下文的非结构化提示;提示2:从教授角度构建的结构化提示;提示3:从学生角度构建的结构化提示)。使用作者开发的5点李克特式有效量表,从准确性、相关性、清晰度与结构、提供的示例、批判性思维和参考文献六个领域评估所有人工智能生成的回答。
与非结构化提示相比,使用结构化提示时,所有三个人工智能模型都生成了质量更高的回答,并且在布鲁姆分类法的不同水平上表现良好。ChatGPT-4o和ChatGPT-4o Mini在回答研究相关问题方面比Perplexity AI表现更好。
人工智能模型有望作为辅助工具,增强本科护理学生对基础研究概念的理解。有必要进行进一步研究,以评估它们对护理教育中特定研究相关学习成果的影响。