Xu Ruiyu, Hong Ying, Zhang Feifei, Xu Hongmei
College of Nursing, Binzhou Medical University, Yantai, Shandong Province, China.
Department of Breast Surgery, Ningbo No. 2 Hospital, Ningbo, Zhejiang Province, China.
Sci Rep. 2024 Dec 28;14(1):30794. doi: 10.1038/s41598-024-81052-3.
Breast cancer is one of the most common malignant tumors in women worldwide. Although large language models (LLMs) can provide breast cancer nursing care consultation, inherent hallucinations can lead to inaccurate responses. Retrieval-augmented generation (RAG) technology can improve LLM performance, offering a new approach for clinical applications. In the present study, we evaluated the performance of a LLM in breast cancer nursing care using RAG technology. In the control group (GPT-4), questions were answered directly using the GPT-4 model, whereas the experimental group (RAG-GPT) used the GPT-4 model combined with RAG. A knowledge base for breast cancer nursing was created for the RAG-GPT group, and 15 of 200 real-world clinical care questions were answered randomly. The primary endpoint was overall satisfaction, and the secondary endpoints were accuracy and empathy. RAG-GPT included a curated knowledge base related to breast cancer nursing care, including textbooks, guidelines, and traditional Chinese therapy. The RAG-GPT group showed significantly higher overall satisfaction than that of the GPT-4 group (8.4 ± 0.84 vs. 5.4 ± 1.27, p < 0.01) as well as an improved accuracy of responses (8.6 ± 0.69 vs. 5.6 ± 0.96, p < 0.01). However, there was no inter-group difference in empathy (8.4 ± 0.85 vs. 7.8 ± 1.22, p > 0.05). Overall, this study revealed that RAG technology could improve LLM performance significantly, likely because of the increased accuracy of the answers without diminishing empathy. These findings provide a theoretical basis for applying RAG technology to LLMs in clinical nursing practice and education.
乳腺癌是全球女性中最常见的恶性肿瘤之一。尽管大语言模型(LLMs)可以提供乳腺癌护理咨询,但内在的幻觉可能导致回答不准确。检索增强生成(RAG)技术可以提高大语言模型的性能,为临床应用提供了一种新方法。在本研究中,我们使用RAG技术评估了大语言模型在乳腺癌护理中的性能。在对照组(GPT-4)中,直接使用GPT-4模型回答问题,而实验组(RAG-GPT)则使用GPT-4模型结合RAG。为RAG-GPT组创建了一个乳腺癌护理知识库,并随机回答了200个实际临床护理问题中的15个。主要终点是总体满意度,次要终点是准确性和同理心。RAG-GPT包括一个与乳腺癌护理相关的精选知识库,包括教科书、指南和中医治疗方法。RAG-GPT组的总体满意度显著高于GPT-4组(8.4±0.84 vs. 5.4±1.27,p<0.01),回答的准确性也有所提高(8.6±0.69 vs. 5.6±0.96,p<0.01)。然而,两组在同理心方面没有差异(8.4±0.85 vs. 7.8±1.22,p>0.05)。总体而言,本研究表明RAG技术可以显著提高大语言模型的性能,可能是因为答案的准确性提高而没有降低同理心。这些发现为将RAG技术应用于临床护理实践和教育中的大语言模型提供了理论依据。