Kim Junseo, Kim Seok Jun, Ahn Junseok, Lee Suehyun
Department of Computer Engineering, College of IT Convergence, Gachon University, Seongnam, Korea.
Department of IT Convergence, Graduate School, Gachon University, Seongnam, Korea.
Healthc Inform Res. 2025 Apr;31(2):136-145. doi: 10.4258/hir.2025.31.2.136. Epub 2025 Apr 30.
This research aimed to develop a retrieval-augmented generation (RAG) based large language model (LLM) system that offers personalized and reliable responses to a wide range of concerns raised by Korean adolescents. Our work focuses on building a culturally reflective dataset and on designing and validating the system's effectiveness by comparing the answer quality of RAG-based models with non-RAG models.
Data were collected from the NAVER Knowledge iN platform, concentrating on posts that featured adolescents' questions and corresponding expert responses during the period 2014-2024. The dataset comprises 3,874 cases, categorized by key negative emotions and the primary sources of worry. The data were processed to remove irrelevant or redundant content and then classified into general and detailed causes. The RAG-based model employed FAISS for similarity-based retrieval of the top three reference cases and used GPT-4o mini for response generation. The responses generated with and without RAG were evaluated using several metrics.
RAG-based responses outperformed non-RAG responses across all evaluation metrics. Key findings indicate that RAG-based responses delivered more specific, empathetic, and actionable guidance, particularly when addressing complex emotional and situational concerns. The analysis revealed that family relationships, peer interactions, and academic stress are significant factors affecting adolescents' worries, with depression and stress frequently co-occurring.
This study demonstrates the potential of RAG-based LLMs to address the diverse and culture-specific worries of Korean adolescents. By integrating external knowledge and offering personalized support, the proposed system provides a scalable approach to enhancing mental health interventions for adolescents. Future research should concentrate on expanding the dataset and improving multiturn conversational capabilities to deliver even more comprehensive support.
本研究旨在开发一个基于检索增强生成(RAG)的大语言模型(LLM)系统,该系统能够针对韩国青少年提出的各种问题提供个性化且可靠的回答。我们的工作重点是构建一个反映文化特色的数据集,并通过比较基于RAG的模型与非RAG模型的答案质量来设计和验证该系统的有效性。
数据收集自NAVER知识iN平台,重点关注2014年至2024年期间以青少年问题及相应专家回答为特色的帖子。该数据集包含3874个案例,按关键负面情绪和主要担忧来源进行分类。对数据进行处理以去除无关或冗余内容,然后分为一般原因和详细原因。基于RAG的模型采用FAISS进行基于相似度的前三个参考案例检索,并使用GPT-4o mini进行回答生成。使用多种指标对有无RAG生成的回答进行评估。
在所有评估指标上,基于RAG的回答均优于非RAG回答。主要发现表明,基于RAG的回答提供了更具体、更具同理心且可操作的指导,尤其是在解决复杂的情绪和情境问题时。分析显示,家庭关系、同伴互动和学业压力是影响青少年担忧的重要因素,抑郁和压力经常同时出现。
本研究证明了基于RAG的大语言模型在解决韩国青少年多样化且特定文化的担忧方面的潜力。通过整合外部知识并提供个性化支持,所提出的系统为加强青少年心理健康干预提供了一种可扩展的方法。未来的研究应集中在扩大数据集和提高多轮对话能力,以提供更全面的支持。