Suppr超能文献

超越聊天机器人:护理教育中评估人工智能聊天机器人模拟的双阶段框架。

Beyond the Bot: A Dual-Phase Framework for Evaluating AI Chatbot Simulations in Nursing Education.

作者信息

Olla Phillip, Wodwaski Nadine, Long Taylor

机构信息

College of Health Professional, University of Detroit Mercy, Detroit, MI 48221, USA.

McAuley School of Nursing, University of Detroit Mercy, Detroit, MI 48221, USA.

出版信息

Nurs Rep. 2025 Jul 31;15(8):280. doi: 10.3390/nursrep15080280.

Abstract

The integration of AI chatbots in nursing education, particularly in simulation-based learning, is advancing rapidly. However, there is a lack of structured evaluation models, especially to assess AI-generated simulations. This article introduces the AI-Integrated Method for Simulation (AIMS) evaluation framework, a dual-phase evaluation framework adapted from the FAITA model, designed to evaluate both prompt design and chatbot performance in the context of nursing education. This simulation-based study explored the application of an AI chatbot in an emergency planning course. The AIMS framework was developed and applied, consisting of six prompt-level domains (Phase 1) and eight performance criteria (Phase 2). These domains were selected based on current best practices in instructional design, simulation fidelity, and emerging AI evaluation literature. To assess the chatbots educational utility, the study employed a scoring rubric for each phase and incorporated a structured feedback loop to refine both prompt design and chatbox interaction. To demonstrate the framework's practical application, the researchers configured an AI tool referred to in this study as "Eval-Bot v1", built using OpenAI's GPT-4.0, to apply Phase 1 scoring criteria to a real simulation prompt. Insights from this analysis were then used to anticipate Phase 2 performance and identify areas for improvement. Participants (three individuals)-all experienced healthcare educators and advanced practice nurses with expertise in clinical decision-making and simulation-based teaching-reviewed the prompt and Eval-Bot's score to triangulate findings. Simulated evaluations revealed clear strengths in the prompt alignment with course objectives and its capacity to foster interactive learning. Participants noted that the AI chatbot supported engagement and maintained appropriate pacing, particularly in scenarios involving emergency planning decision-making. However, challenges emerged in areas related to personalization and inclusivity. While the chatbot responded consistently to general queries, it struggled to adapt tone, complexity and content to reflect diverse learner needs or cultural nuances. To support replication and refinement, a sample scoring rubric and simulation prompt template are provided. When evaluated using the Eval-Bot tool, moderate concerns were flagged regarding safety prompts and inclusive language, particularly in how the chatbot navigated sensitive decision points. These gaps were linked to predicted performance issues in Phase 2 domains such as dialog control, equity, and user reassurance. Based on these findings, revised prompt strategies were developed to improve contextual sensitivity, promote inclusivity, and strengthen ethical guidance within chatbot-led simulations. The AIMS evaluation framework provides a practical and replicable approach for evaluating the use of AI chatbots in simulation-based education. By offering structured criteria for both prompt design and chatbot performance, the model supports instructional designers, simulation specialists, and developers in identifying areas of strength and improvement. The findings underscore the importance of intentional design, safety monitoring, and inclusive language when integrating AI into nursing and health education. As AI tools become more embedded in learning environments, this framework offers a thoughtful starting point for ensuring they are applied ethically, effectively, and with learner diversity in mind.

摘要

人工智能聊天机器人在护理教育中的整合,尤其是在基于模拟的学习中,正在迅速发展。然而,缺乏结构化的评估模型,特别是用于评估人工智能生成的模拟。本文介绍了人工智能集成模拟方法(AIMS)评估框架,这是一个从FAITA模型改编而来的双阶段评估框架,旨在评估护理教育背景下的提示设计和聊天机器人性能。这项基于模拟的研究探索了人工智能聊天机器人在应急规划课程中的应用。AIMS框架得以开发和应用,包括六个提示级领域(第一阶段)和八个性能标准(第二阶段)。这些领域是根据教学设计、模拟逼真度和新兴的人工智能评估文献中的当前最佳实践选择的。为了评估聊天机器人的教育效用,该研究为每个阶段采用了评分标准,并纳入了一个结构化的反馈循环,以完善提示设计和聊天框交互。为了展示该框架的实际应用,研究人员配置了一个在本研究中称为“Eval-Bot v1”的人工智能工具,该工具使用OpenAI的GPT-4.0构建,用于将第一阶段评分标准应用于实际模拟提示。然后,利用该分析的见解来预测第二阶段的性能并确定改进领域。参与者(三人)——都是经验丰富的医疗保健教育工作者和高级实践护士,在临床决策和基于模拟的教学方面具有专业知识——审查了提示和Eval-Bot的分数,以三角测量结果。模拟评估揭示了提示与课程目标的一致性及其促进互动学习的能力方面的明显优势。参与者指出,人工智能聊天机器人支持参与并保持适当的节奏,特别是在涉及应急规划决策的场景中。然而,在个性化和包容性方面出现了挑战。虽然聊天机器人对一般问题的回答一致,但它难以调整语气、复杂性和内容以反映不同学习者的需求或文化细微差别。为了支持复制和完善,提供了一个示例评分标准和模拟提示模板。当使用Eval-Bot工具进行评估时,对安全提示和包容性语言提出了适度的担忧,特别是在聊天机器人如何处理敏感决策点方面。这些差距与第二阶段领域(如对话控制、公平性和用户安心)中预测的性能问题相关。基于这些发现,制定了修订后的提示策略,以提高上下文敏感性、促进包容性并加强聊天机器人主导模拟中的道德指导。AIMS评估框架为评估人工智能聊天机器人在基于模拟的教育中的使用提供了一种实用且可复制的方法。通过为提示设计和聊天机器人性能提供结构化标准,该模型支持教学设计人员、模拟专家和开发人员识别优势和改进领域。研究结果强调了在将人工智能整合到护理和健康教育中时进行有意设计、安全监控和使用包容性语言的重要性。随着人工智能工具在学习环境中越来越根深蒂固,这个框架为确保它们在道德、有效且考虑到学习者多样性的情况下应用提供了一个深思熟虑的起点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6cd/12389130/7dd1bb1b2158/nursrep-15-00280-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验