Suppr超能文献

用于神经外科手术的基于大语言模型的聊天机器人的开发与验证:关于加强围手术期患者教育的混合方法研究

Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.

作者信息

Ho Chung Man, Guan Shaowei, Mok Prudence Kwan-Lam, Lam Candice Hw, Ho Wai Ying, Mak Calvin Hoi-Kwan, Qin Harry, Wong Arkers Kwan Ching, Hui Vivian

机构信息

Neurosurgery Department, Queen Elizabeth Hospital, Kowloon, China (Hong Kong).

Department of Electrical and Electronic Engineering, Hong Kong Polytechnic University, Kowloon, China (Hong Kong).

出版信息

J Med Internet Res. 2025 Jul 15;27:e74299. doi: 10.2196/74299.

Abstract

BACKGROUND

Perioperative education is crucial for optimizing outcomes in neuroendovascular procedures, where inadequate understanding can heighten patient anxiety and hinder care plan adherence. Current education models, reliant on traditional consultations and printed materials, often lack scalability and personalization. Artificial intelligence (AI)-powered chatbots have demonstrated efficacy in various health care contexts; however, their role in neuroendovascular perioperative support remains underexplored. Given the complexity of neuroendovascular procedures and the need for continuous, tailored patient education, AI chatbots have the potential to offer tailored perioperative guidance to improve patient education in this specialty.

OBJECTIVE

We aimed to develop, validate, and assess NeuroBot, an AI-driven system that uses large language models (LLMs) with retrieval-augmented generation to deliver timely, accurate, and evidence-based responses to patient inquiries in neurosurgery, ultimately improving the effectiveness of patient education.

METHODS

A mixed methods approach was used, consisting of 3 phases. In the first phase, internal validation, we compared the performance of Assistants API, ChatGPT, and Qwen by evaluating their responses to 306 bilingual neuroendovascular-related questions. The accuracy, relevance, and completeness of the responses were evaluated using a Likert scale; statistical analyses included ANOVA and paired t tests. In the second phase, external validation, 10 neurosurgical experts rated the responses generated by NeuroBot using the same evaluation metrics applied in the internal validation phase. The consistency of their ratings was measured using the intraclass correlation coefficient. Finally, in the third phase, a qualitative study was conducted through interviews with 18 health care providers, which helped identify key themes related to the NeuroBot's usability and perceived benefits. Thematic analysis was performed using NVivo and interrater reliability was confirmed through Cohen κ.

RESULTS

The Assistants API outperformed both ChatGPT and Qwen, achieving a mean accuracy score of 5.28 out of 6 (95% CI 5.21-5.35), with a statistically significant result (P<.001). External expert ratings for NeuroBot demonstrated significant improvements, with scores of 5.70 out of 6 (95% CI 5.46-5.94) for accuracy, 5.58 out of 6 (95% CI 5.45-5.94) for relevance, and 2.70 out of 3 (95% CI 2.73-2.97) for completeness. Qualitative insights highlighted NeuroBot's potential to reduce staff workload, enhance patient education, and deliver evidence-based responses.

CONCLUSIONS

NeuroBot, leveraging LLMs with the retrieval-augmented generation technique, demonstrates the potential of LLM-based chatbots in perioperative neuroendovascular care, offering scalable and continuous support. By integrating domain-specific knowledge, NeuroBot simplifies communication between professionals and patients while ensuring patients have 24-7 access to reliable, evidence-based information. Further refinement and research will enhance NeuroBot's ability to foster patient-centered communication, optimize clinical outcomes, and advance AI-driven innovations in health care delivery.

摘要

背景

围手术期教育对于优化神经血管介入手术的结果至关重要,因为理解不足会加剧患者焦虑并阻碍护理计划的执行。当前的教育模式依赖传统咨询和印刷材料,往往缺乏可扩展性和个性化。人工智能驱动的聊天机器人已在各种医疗环境中显示出有效性;然而,它们在神经血管介入围手术期支持中的作用仍未得到充分探索。鉴于神经血管介入手术的复杂性以及对持续、个性化患者教育的需求,人工智能聊天机器人有潜力提供量身定制的围手术期指导,以改善该专业的患者教育。

目的

我们旨在开发、验证和评估NeuroBot,这是一个人工智能驱动的系统,它使用具有检索增强生成功能的大语言模型,为神经外科患者的询问提供及时、准确和基于证据的回复,最终提高患者教育的效果。

方法

采用混合方法,包括三个阶段。在第一阶段,即内部验证阶段,我们通过评估Assistants API、ChatGPT和文心一言对306个双语神经血管相关问题的回答来比较它们的性能。使用李克特量表评估回答的准确性、相关性和完整性;统计分析包括方差分析和配对t检验。在第二阶段,即外部验证阶段,10位神经外科专家使用与内部验证阶段相同的评估指标对NeuroBot生成的回答进行评分。使用组内相关系数测量他们评分的一致性。最后,在第三阶段,通过对18名医疗服务提供者进行访谈开展了一项定性研究,这有助于确定与NeuroBot的可用性和感知益处相关的关键主题。使用NVivo进行主题分析,并通过科恩κ系数确认评分者间信度。

结果

Assistants API的表现优于ChatGPT和文心一言,在6分制中平均准确率得分为5.28(95%置信区间5.21 - 5.35),具有统计学显著结果(P <.001)。NeuroBot的外部专家评分显示出显著改善,准确性得分为6分制中的5.70(95%置信区间5.46 - 5.94),相关性得分为5.58(95%置信区间5.45 - 5.94),完整性得分为3分制中的2.70(95%置信区间2.73 - 2.97)。定性见解突出了NeuroBot在减轻工作人员工作量、加强患者教育和提供基于证据的回复方面的潜力。

结论

NeuroBot利用具有检索增强生成技术的大语言模型,展示了基于大语言模型的聊天机器人在围手术期神经血管护理中的潜力,提供了可扩展的持续支持。通过整合特定领域知识,NeuroBot简化了专业人员与患者之间的沟通,同时确保患者能够随时获取可靠的、基于证据的信息。进一步的优化和研究将提高NeuroBot促进以患者为中心的沟通、优化临床结果以及推动医疗保健服务中人工智能驱动创新的能力。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验