基于大语言模型的韩国青少年回答生成：一项使用带有检索增强生成的NAVER知识问答数据集的研究。

LLM-Based Response Generation for Korean Adolescents: A Study Using the NAVER Knowledge iN Q&A Dataset with RAG.

作者信息

Kim Junseo, Kim Seok Jun, Ahn Junseok, Lee Suehyun

机构信息

Department of Computer Engineering, College of IT Convergence, Gachon University, Seongnam, Korea.

Department of IT Convergence, Graduate School, Gachon University, Seongnam, Korea.

出版信息

Healthc Inform Res. 2025 Apr;31(2):136-145. doi: 10.4258/hir.2025.31.2.136. Epub 2025 Apr 30.

DOI:10.4258/hir.2025.31.2.136

PMID:40384065

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12086440/

Abstract

OBJECTIVES

This research aimed to develop a retrieval-augmented generation (RAG) based large language model (LLM) system that offers personalized and reliable responses to a wide range of concerns raised by Korean adolescents. Our work focuses on building a culturally reflective dataset and on designing and validating the system's effectiveness by comparing the answer quality of RAG-based models with non-RAG models.

METHODS

Data were collected from the NAVER Knowledge iN platform, concentrating on posts that featured adolescents' questions and corresponding expert responses during the period 2014-2024. The dataset comprises 3,874 cases, categorized by key negative emotions and the primary sources of worry. The data were processed to remove irrelevant or redundant content and then classified into general and detailed causes. The RAG-based model employed FAISS for similarity-based retrieval of the top three reference cases and used GPT-4o mini for response generation. The responses generated with and without RAG were evaluated using several metrics.

RESULTS

RAG-based responses outperformed non-RAG responses across all evaluation metrics. Key findings indicate that RAG-based responses delivered more specific, empathetic, and actionable guidance, particularly when addressing complex emotional and situational concerns. The analysis revealed that family relationships, peer interactions, and academic stress are significant factors affecting adolescents' worries, with depression and stress frequently co-occurring.

CONCLUSIONS

This study demonstrates the potential of RAG-based LLMs to address the diverse and culture-specific worries of Korean adolescents. By integrating external knowledge and offering personalized support, the proposed system provides a scalable approach to enhancing mental health interventions for adolescents. Future research should concentrate on expanding the dataset and improving multiturn conversational capabilities to deliver even more comprehensive support.

摘要

目标

本研究旨在开发一个基于检索增强生成（RAG）的大语言模型（LLM）系统，该系统能够针对韩国青少年提出的各种问题提供个性化且可靠的回答。我们的工作重点是构建一个反映文化特色的数据集，并通过比较基于RAG的模型与非RAG模型的答案质量来设计和验证该系统的有效性。

方法

数据收集自NAVER知识iN平台，重点关注2014年至2024年期间以青少年问题及相应专家回答为特色的帖子。该数据集包含3874个案例，按关键负面情绪和主要担忧来源进行分类。对数据进行处理以去除无关或冗余内容，然后分为一般原因和详细原因。基于RAG的模型采用FAISS进行基于相似度的前三个参考案例检索，并使用GPT-4o mini进行回答生成。使用多种指标对有无RAG生成的回答进行评估。

结果

在所有评估指标上，基于RAG的回答均优于非RAG回答。主要发现表明，基于RAG的回答提供了更具体、更具同理心且可操作的指导，尤其是在解决复杂的情绪和情境问题时。分析显示，家庭关系、同伴互动和学业压力是影响青少年担忧的重要因素，抑郁和压力经常同时出现。

结论

本研究证明了基于RAG的大语言模型在解决韩国青少年多样化且特定文化的担忧方面的潜力。通过整合外部知识并提供个性化支持，所提出的系统为加强青少年心理健康干预提供了一种可扩展的方法。未来的研究应集中在扩大数据集和提高多轮对话能力，以提供更全面的支持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fce5/12086440/69f237b972d9/hir-2025-31-2-136f1.jpg

相似文献

LLM-Based Response Generation for Korean Adolescents: A Study Using the NAVER Knowledge iN Q&A Dataset with RAG.基于大语言模型的韩国青少年回答生成：一项使用带有检索增强生成的NAVER知识问答数据集的研究。

Healthc Inform Res. 2025 Apr;31(2):136-145. doi: 10.4258/hir.2025.31.2.136. Epub 2025 Apr 30.

Evaluating and Enhancing Japanese Large Language Models for Genetic Counseling Support: Comparative Study of Domain Adaptation and the Development of an Expert-Evaluated Dataset.评估和增强用于遗传咨询支持的日本大语言模型：领域适应的比较研究与专家评估数据集的开发

JMIR Med Inform. 2025 Jan 16;13:e65047. doi: 10.2196/65047.

Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study.使用检索增强大语言模型进行COVID-19事实核查：开发与可用性研究。

J Med Internet Res. 2025 Apr 30;27:e66098. doi: 10.2196/66098.

Application of NotebookLM, a large language model with retrieval-augmented generation, for lung cancer staging.具有检索增强生成功能的大型语言模型NotebookLM在肺癌分期中的应用。

Jpn J Radiol. 2025 Apr;43(4):706-712. doi: 10.1007/s11604-024-01705-1. Epub 2024 Nov 25.

Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models.改善膳食补充剂信息检索：利用大语言模型开发检索增强生成系统

J Med Internet Res. 2025 Mar 19;27:e67677. doi: 10.2196/67677.

Optimizing theranostics chatbots with context-augmented large language models.利用上下文增强大语言模型优化治疗诊断聊天机器人。

Theranostics. 2025 Apr 21;15(12):5693-5704. doi: 10.7150/thno.107757. eCollection 2025.

Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model.使用检索增强语言模型提高GPT-3/4在生物医学数据上的结果准确性。

PLOS Digit Health. 2024 Aug 21;3(8):e0000568. doi: 10.1371/journal.pdig.0000568. eCollection 2024 Aug.

Detecting emergencies in patient portal messages using large language models and knowledge graph-based retrieval-augmented generation.使用大语言模型和基于知识图谱的检索增强生成技术检测患者门户消息中的紧急情况。

J Am Med Inform Assoc. 2025 Jun 1;32(6):1032-1039. doi: 10.1093/jamia/ocaf059.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较：评估研究。

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Custom Large Language Models Improve Accuracy: Comparing Retrieval Augmented Generation and Artificial Intelligence Agents to Noncustom Models for Evidence-Based Medicine.定制大语言模型提高准确性：将检索增强生成和人工智能代理与非定制模型在循证医学方面进行比较

Arthroscopy. 2025 Mar;41(3):565-573.e6. doi: 10.1016/j.arthro.2024.10.042. Epub 2024 Nov 7.

本文引用的文献

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.心理语言模型：通过在线文本数据利用大语言模型进行心理健康预测。

Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024 Mar;8(1). doi: 10.1145/3643540. Epub 2024 Mar 6.

Large Language Models for Mental Health Applications: Systematic Review.大型语言模型在精神健康应用中的应用：系统评价。

JMIR Ment Health. 2024 Oct 18;11:e57400. doi: 10.2196/57400.

Associations between Physical Activity, Mental Health, and Suicidal Behavior in Korean Adolescents: Based on Data from 18th Korea Youth Risk Behavior Web-Based Survey (2022).韩国青少年身体活动、心理健康与自杀行为之间的关联：基于第18次韩国青少年风险行为网络调查（2022年）的数据

Behav Sci (Basel). 2024 Feb 22;14(3):160. doi: 10.3390/bs14030160.

Towards a youth mental health paradigm: a perspective and roadmap.迈向青年心理健康范式：观点与路线图。

Mol Psychiatry. 2023 Aug;28(8):3171-3181. doi: 10.1038/s41380-023-02202-z. Epub 2023 Aug 14.

Introduction of Child and Adolescent Mental Health Services in Korea and Their Role During the COVID-19 Pandemic: Focusing on the Ministry of Education Policy.韩国儿童和青少年心理健康服务介绍及其在新冠疫情期间的作用：以教育部政策为重点

J Korean Acad Child Adolesc Psychiatry. 2023 Jan 1;34(1):4-14. doi: 10.5765/jkacap.220034.

Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review.用于评估医疗保健聊天机器人的技术指标：范围综述。

J Med Internet Res. 2020 Jun 5;22(6):e18301. doi: 10.2196/18301.

What makes adolescents psychologically distressed? Life events as risk factors for depression and suicide.是什么让青少年感到心理困扰？生活事件是抑郁和自杀的风险因素。

Eur Child Adolesc Psychiatry. 2021 Mar;30(3):359-367. doi: 10.1007/s00787-020-01520-9. Epub 2020 Mar 30.

Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the National Comorbidity Survey Replication.全国共病调查复制研究中 DSM-IV 障碍的终生患病率和发病年龄分布

Arch Gen Psychiatry. 2005 Jun;62(6):593-602. doi: 10.1001/archpsyc.62.6.593.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于大语言模型的韩国青少年回答生成：一项使用带有检索增强生成的NAVER知识问答数据集的研究。

LLM-Based Response Generation for Korean Adolescents: A Study Using the NAVER Knowledge iN Q&A Dataset with RAG.

作者信息

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSIONS

目标

方法

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献