• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超越聊天机器人:护理教育中评估人工智能聊天机器人模拟的双阶段框架。

Beyond the Bot: A Dual-Phase Framework for Evaluating AI Chatbot Simulations in Nursing Education.

作者信息

Olla Phillip, Wodwaski Nadine, Long Taylor

机构信息

College of Health Professional, University of Detroit Mercy, Detroit, MI 48221, USA.

McAuley School of Nursing, University of Detroit Mercy, Detroit, MI 48221, USA.

出版信息

Nurs Rep. 2025 Jul 31;15(8):280. doi: 10.3390/nursrep15080280.

DOI:10.3390/nursrep15080280
PMID:40863667
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12389130/
Abstract

The integration of AI chatbots in nursing education, particularly in simulation-based learning, is advancing rapidly. However, there is a lack of structured evaluation models, especially to assess AI-generated simulations. This article introduces the AI-Integrated Method for Simulation (AIMS) evaluation framework, a dual-phase evaluation framework adapted from the FAITA model, designed to evaluate both prompt design and chatbot performance in the context of nursing education. This simulation-based study explored the application of an AI chatbot in an emergency planning course. The AIMS framework was developed and applied, consisting of six prompt-level domains (Phase 1) and eight performance criteria (Phase 2). These domains were selected based on current best practices in instructional design, simulation fidelity, and emerging AI evaluation literature. To assess the chatbots educational utility, the study employed a scoring rubric for each phase and incorporated a structured feedback loop to refine both prompt design and chatbox interaction. To demonstrate the framework's practical application, the researchers configured an AI tool referred to in this study as "Eval-Bot v1", built using OpenAI's GPT-4.0, to apply Phase 1 scoring criteria to a real simulation prompt. Insights from this analysis were then used to anticipate Phase 2 performance and identify areas for improvement. Participants (three individuals)-all experienced healthcare educators and advanced practice nurses with expertise in clinical decision-making and simulation-based teaching-reviewed the prompt and Eval-Bot's score to triangulate findings. Simulated evaluations revealed clear strengths in the prompt alignment with course objectives and its capacity to foster interactive learning. Participants noted that the AI chatbot supported engagement and maintained appropriate pacing, particularly in scenarios involving emergency planning decision-making. However, challenges emerged in areas related to personalization and inclusivity. While the chatbot responded consistently to general queries, it struggled to adapt tone, complexity and content to reflect diverse learner needs or cultural nuances. To support replication and refinement, a sample scoring rubric and simulation prompt template are provided. When evaluated using the Eval-Bot tool, moderate concerns were flagged regarding safety prompts and inclusive language, particularly in how the chatbot navigated sensitive decision points. These gaps were linked to predicted performance issues in Phase 2 domains such as dialog control, equity, and user reassurance. Based on these findings, revised prompt strategies were developed to improve contextual sensitivity, promote inclusivity, and strengthen ethical guidance within chatbot-led simulations. The AIMS evaluation framework provides a practical and replicable approach for evaluating the use of AI chatbots in simulation-based education. By offering structured criteria for both prompt design and chatbot performance, the model supports instructional designers, simulation specialists, and developers in identifying areas of strength and improvement. The findings underscore the importance of intentional design, safety monitoring, and inclusive language when integrating AI into nursing and health education. As AI tools become more embedded in learning environments, this framework offers a thoughtful starting point for ensuring they are applied ethically, effectively, and with learner diversity in mind.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6cd/12389130/7dd1bb1b2158/nursrep-15-00280-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6cd/12389130/7dd1bb1b2158/nursrep-15-00280-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a6cd/12389130/7dd1bb1b2158/nursrep-15-00280-g001.jpg
摘要

人工智能聊天机器人在护理教育中的整合,尤其是在基于模拟的学习中,正在迅速发展。然而,缺乏结构化的评估模型,特别是用于评估人工智能生成的模拟。本文介绍了人工智能集成模拟方法(AIMS)评估框架,这是一个从FAITA模型改编而来的双阶段评估框架,旨在评估护理教育背景下的提示设计和聊天机器人性能。这项基于模拟的研究探索了人工智能聊天机器人在应急规划课程中的应用。AIMS框架得以开发和应用,包括六个提示级领域(第一阶段)和八个性能标准(第二阶段)。这些领域是根据教学设计、模拟逼真度和新兴的人工智能评估文献中的当前最佳实践选择的。为了评估聊天机器人的教育效用,该研究为每个阶段采用了评分标准,并纳入了一个结构化的反馈循环,以完善提示设计和聊天框交互。为了展示该框架的实际应用,研究人员配置了一个在本研究中称为“Eval-Bot v1”的人工智能工具,该工具使用OpenAI的GPT-4.0构建,用于将第一阶段评分标准应用于实际模拟提示。然后,利用该分析的见解来预测第二阶段的性能并确定改进领域。参与者(三人)——都是经验丰富的医疗保健教育工作者和高级实践护士,在临床决策和基于模拟的教学方面具有专业知识——审查了提示和Eval-Bot的分数,以三角测量结果。模拟评估揭示了提示与课程目标的一致性及其促进互动学习的能力方面的明显优势。参与者指出,人工智能聊天机器人支持参与并保持适当的节奏,特别是在涉及应急规划决策的场景中。然而,在个性化和包容性方面出现了挑战。虽然聊天机器人对一般问题的回答一致,但它难以调整语气、复杂性和内容以反映不同学习者的需求或文化细微差别。为了支持复制和完善,提供了一个示例评分标准和模拟提示模板。当使用Eval-Bot工具进行评估时,对安全提示和包容性语言提出了适度的担忧,特别是在聊天机器人如何处理敏感决策点方面。这些差距与第二阶段领域(如对话控制、公平性和用户安心)中预测的性能问题相关。基于这些发现,制定了修订后的提示策略,以提高上下文敏感性、促进包容性并加强聊天机器人主导模拟中的道德指导。AIMS评估框架为评估人工智能聊天机器人在基于模拟的教育中的使用提供了一种实用且可复制的方法。通过为提示设计和聊天机器人性能提供结构化标准,该模型支持教学设计人员、模拟专家和开发人员识别优势和改进领域。研究结果强调了在将人工智能整合到护理和健康教育中时进行有意设计、安全监控和使用包容性语言的重要性。随着人工智能工具在学习环境中越来越根深蒂固,这个框架为确保它们在道德、有效且考虑到学习者多样性的情况下应用提供了一个深思熟虑的起点。

相似文献

1
Beyond the Bot: A Dual-Phase Framework for Evaluating AI Chatbot Simulations in Nursing Education.超越聊天机器人:护理教育中评估人工智能聊天机器人模拟的双阶段框架。
Nurs Rep. 2025 Jul 31;15(8):280. doi: 10.3390/nursrep15080280.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验:对定性文献的系统综述
JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.
4
A Scoping Review of the Observed and Perceived Functional Impacts Associated With Language and Learning Disorders in School-Aged Children.一项关于学龄儿童语言和学习障碍相关的观察到的和感知到的功能影响的范围综述。
Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70086. doi: 10.1111/1460-6984.70086.
5
Enhancing education for children with ASD: a review of evaluation and measurement in AI tool implementation.加强自闭症谱系障碍儿童的教育:人工智能工具实施中的评估与测量综述
Disabil Rehabil Assist Technol. 2025 Mar 13:1-18. doi: 10.1080/17483107.2025.2477678.
6
Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗:一项系统综述
Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.
7
Stench of Errors or the Shine of Potential: The Challenge of (Ir)Responsible Use of ChatGPT in Speech-Language Pathology.错误的恶臭还是潜力的光辉:言语病理学中(不)负责任地使用ChatGPT的挑战。
Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70088. doi: 10.1111/1460-6984.70088.
8
Chatbot for the Return of Positive Genetic Screening Results for Hereditary Cancer Syndromes: Prompt Engineering Project.遗传性癌症综合征阳性基因筛查结果返回的聊天机器人:提示工程设计项目
JMIR Cancer. 2025 Jun 10;11:e65848. doi: 10.2196/65848.
9
A Comprehensive and Modality Diverse Cervical Spine and Back Musculoskeletal Physical Exam Curriculum for Medical Students.面向医学生的全面且多模态的颈椎和背部肌肉骨骼物理检查课程
J Educ Teach Emerg Med. 2025 Jul 31;10(3):SG1-SG8. doi: 10.21980/J8RQ0N. eCollection 2025 Jul.
10
Consumers' and health providers' views and perceptions of partnering to improve health services design, delivery and evaluation: a co-produced qualitative evidence synthesis.消费者和卫生服务提供者对合作改善卫生服务设计、提供和评估的看法和认知:一项共同制定的定性证据综合研究。
Cochrane Database Syst Rev. 2023 Mar 14;3(3):CD013274. doi: 10.1002/14651858.CD013274.pub2.

本文引用的文献

1
Is This Chatbot Safe and Evidence-Based? A Call for the Critical Evaluation of Generative AI Mental Health Chatbots.这个聊天机器人安全且基于证据吗?呼吁对生成式人工智能心理健康聊天机器人进行批判性评估。
J Particip Med. 2025 May 29;17:e69534. doi: 10.2196/69534.
2
A Review of Large Language Models in Medical Education, Clinical Decision Support, and Healthcare Administration.医学教育、临床决策支持与医疗管理中的大语言模型综述
Healthcare (Basel). 2025 Mar 10;13(6):603. doi: 10.3390/healthcare13060603.
3
Exploring the impact on faculty of the American Association of Colleges of Nursing's The Essentials: Core Competencies for Professional Nursing Education.
探究美国护理学院协会的《护理学专业教育核心能力纲要》对教师的影响。
J Prof Nurs. 2025 Mar-Apr;57:139-147. doi: 10.1016/j.profnurs.2025.02.003. Epub 2025 Feb 7.
4
Advancing Clinical Chatbot Validation Using AI-Powered Evaluation With a New 3-Bot Evaluation System: Instrument Validation Study.使用具有新型三机器人评估系统的人工智能驱动评估推进临床聊天机器人验证:工具验证研究
JMIR Nurs. 2025 Feb 27;8:e63058. doi: 10.2196/63058.
5
Exploring the Ethical Challenges of Conversational AI in Mental Health Care: Scoping Review.探索心理健康护理中对话式人工智能的伦理挑战:范围审查
JMIR Ment Health. 2025 Feb 21;12:e60432. doi: 10.2196/60432.
6
Describing the Framework for AI Tool Assessment in Mental Health and Applying It to a Generative AI Obsessive-Compulsive Disorder Platform: Tutorial.描述心理健康人工智能工具评估框架,并将其应用于生成式人工智能强迫症平台:教程。
JMIR Form Res. 2024 Oct 18;8:e62963. doi: 10.2196/62963.
7
The Framework for AI Tool Assessment in Mental Health (FAITA - Mental Health): a scale for evaluating AI-powered mental health tools.心理健康领域人工智能工具评估框架(FAITA - 心理健康):一种评估人工智能驱动的心理健康工具的量表。
World Psychiatry. 2024 Oct;23(3):444-445. doi: 10.1002/wps.21248.
8
Approaches to Evaluating Digital Health Technologies: Scoping Review.评估数字健康技术的方法:范围综述。
J Med Internet Res. 2024 Aug 28;26:e50251. doi: 10.2196/50251.
9
Evaluation framework for conversational agents with artificial intelligence in health interventions: a systematic scoping review.人工智能在健康干预中的会话代理评估框架:系统范围综述。
J Am Med Inform Assoc. 2024 Feb 16;31(3):746-761. doi: 10.1093/jamia/ocad222.
10
Artificial Intelligence-Based Chatbots for Promoting Health Behavioral Changes: Systematic Review.基于人工智能的聊天机器人促进健康行为改变:系统评价。
J Med Internet Res. 2023 Feb 24;25:e40789. doi: 10.2196/40789.