利用大型语言模型从用户生成的日记文本数据中检测抑郁，作为数字心理健康筛查的新方法：仪器验证研究。

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.

机构信息

Department of Psychiatry, Anam Hospital, Korea University, Seoul, Republic of Korea.

Doctorpresso, Seoul, Republic of Korea.

出版信息

J Med Internet Res. 2024 Sep 18;26:e54617. doi: 10.2196/54617.

DOI:10.2196/54617

PMID:39292502

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11447422/

Abstract

BACKGROUND

Depressive disorders have substantial global implications, leading to various social consequences, including decreased occupational productivity and a high disability burden. Early detection and intervention for clinically significant depression have gained attention; however, the existing depression screening tools, such as the Center for Epidemiologic Studies Depression Scale, have limitations in objectivity and accuracy. Therefore, researchers are identifying objective indicators of depression, including image analysis, blood biomarkers, and ecological momentary assessments (EMAs). Among EMAs, user-generated text data, particularly from diary writing, have emerged as a clinically significant and analyzable source for detecting or diagnosing depression, leveraging advancements in large language models such as ChatGPT.

OBJECTIVE

We aimed to detect depression based on user-generated diary text through an emotional diary writing app using a large language model (LLM). We aimed to validate the value of the semistructured diary text data as an EMA data source.

METHODS

Participants were assessed for depression using the Patient Health Questionnaire and suicide risk was evaluated using the Beck Scale for Suicide Ideation before starting and after completing the 2-week diary writing period. The text data from the daily diaries were also used in the analysis. The performance of leading LLMs, such as ChatGPT with GPT-3.5 and GPT-4, was assessed with and without GPT-3.5 fine-tuning on the training data set. The model performance comparison involved the use of chain-of-thought and zero-shot prompting to analyze the text structure and content.

RESULTS

We used 428 diaries from 91 participants; GPT-3.5 fine-tuning demonstrated superior performance in depression detection, achieving an accuracy of 0.902 and a specificity of 0.955. However, the balanced accuracy was the highest (0.844) for GPT-3.5 without fine-tuning and prompt techniques; it displayed a recall of 0.929.

CONCLUSIONS

Both GPT-3.5 and GPT-4.0 demonstrated relatively reasonable performance in recognizing the risk of depression based on diaries. Our findings highlight the potential clinical usefulness of user-generated text data for detecting depression. In addition to measurable indicators, such as step count and physical activity, future research should increasingly emphasize qualitative digital expression.

摘要

背景

抑郁障碍在全球范围内具有重要意义，导致各种社会后果，包括职业生产力下降和高残疾负担。对于有临床意义的抑郁，早期发现和干预已经引起了关注；然而，现有的抑郁筛查工具，如流行病学研究中心抑郁量表，在客观性和准确性方面存在局限性。因此，研究人员正在寻找抑郁的客观指标，包括图像分析、血液生物标志物和生态瞬时评估（EMA）。在 EMAs 中，用户生成的文本数据，特别是来自日记写作的文本数据，已经成为一种具有临床意义和可分析的检测或诊断抑郁的来源，利用了 ChatGPT 等大型语言模型的进步。

目的

我们旨在通过使用大型语言模型（LLM）的情感日记写作应用程序，基于用户生成的日记文本检测抑郁。我们旨在验证半结构化日记文本数据作为 EMA 数据源的价值。

方法

参与者在开始和完成为期 2 周的日记写作期后，使用患者健康问卷（Patient Health Questionnaire）评估抑郁，使用贝克自杀意念量表（Beck Scale for Suicide Ideation）评估自杀风险。日常日记中的文本数据也用于分析。评估了领先的 LLM（如 ChatGPT 与 GPT-3.5 和 GPT-4）的性能，包括在训练数据集上进行 GPT-3.5 微调前后的性能。模型性能比较涉及使用思维链和零样本提示来分析文本结构和内容。

结果

我们使用了 91 名参与者的 428 篇日记；GPT-3.5 微调在抑郁检测方面表现出优异的性能，准确率为 0.902，特异性为 0.955。然而，GPT-3.5 未进行微调和提示技术的平衡准确性最高（0.844），其召回率为 0.929。

结论

GPT-3.5 和 GPT-4.0 都在基于日记识别抑郁风险方面表现出相对合理的性能。我们的发现强调了用户生成的文本数据用于检测抑郁的潜在临床价值。除了可衡量的指标，如步数和身体活动，未来的研究应该越来越强调定性的数字表达。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a75/11447422/fe72a52c22b2/jmir_v26i1e54617_fig1.jpg

相似文献

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.利用大型语言模型从用户生成的日记文本数据中检测抑郁，作为数字心理健康筛查的新方法：仪器验证研究。

J Med Internet Res. 2024 Sep 18;26:e54617. doi: 10.2196/54617.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

AI in Medical Questionnaires: Innovations, Diagnosis, and Implications.医学问卷中的人工智能：创新、诊断及影响

J Med Internet Res. 2025 Jun 23;27:e72398. doi: 10.2196/72398.

Large Language Models and Empathy: Systematic Review.大语言模型与同理心：系统综述

J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.

Large Language Models and Text Embeddings for Detecting Depression and Suicide in Patient Narratives.用于在患者叙述中检测抑郁症和自杀倾向的大语言模型与文本嵌入技术

JAMA Netw Open. 2025 May 1;8(5):e2511922. doi: 10.1001/jamanetworkopen.2025.11922.

Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.使用移动应用程序与其他方法收集的自我管理调查问卷回复的比较。

Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.

Large Language Model Symptom Identification From Clinical Text: Multicenter Study.基于临床文本的大语言模型症状识别：多中心研究。

J Med Internet Res. 2025 Jul 31;27:e72984. doi: 10.2196/72984.

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study.通过症状描述和总结调整大型语言模型以增强精神病学访谈：初步研究。

JMIR Form Res. 2024 Oct 24;8:e58418. doi: 10.2196/58418.

MarkVCID cerebral small vessel consortium: I. Enrollment, clinical, fluid protocols.马克 VCID 脑小血管联盟：一、入组、临床、液体方案。

Alzheimers Dement. 2021 Apr;17(4):704-715. doi: 10.1002/alz.12215. Epub 2021 Jan 21.

Technological aids for the rehabilitation of memory and executive functioning in children and adolescents with acquired brain injury.脑损伤儿童和青少年记忆与执行功能康复的技术辅助手段。

Cochrane Database Syst Rev. 2016 Jul 1;7(7):CD011020. doi: 10.1002/14651858.CD011020.pub2.

引用本文的文献

Research progress and implications of the application of large language model in shared decision-making in China's healthcare field.大语言模型在中国医疗领域共享决策应用中的研究进展与启示

Front Public Health. 2025 Jul 10;13:1605212. doi: 10.3389/fpubh.2025.1605212. eCollection 2025.

The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.生成式人工智能在心理健康领域的应用及伦理意义：系统综述

JMIR Ment Health. 2025 Jun 27;12:e70610. doi: 10.2196/70610.

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型：基于文献计量分析的综述

J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.

The Applications of Large Language Models in Mental Health: Scoping Review.大语言模型在心理健康领域的应用：范围综述

J Med Internet Res. 2025 May 5;27:e69284. doi: 10.2196/69284.

Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.使用GPT-4o从放射学诊断印象中提取肺栓塞诊断：大语言模型评估研究

JMIR Med Inform. 2025 Apr 9;13:e67706. doi: 10.2196/67706.

本文引用的文献

The now and future of ChatGPT and GPT in psychiatry.ChatGPT 和 GPT 在精神病学中的现在和未来。

Psychiatry Clin Neurosci. 2023 Nov;77(11):592-596. doi: 10.1111/pcn.13588. Epub 2023 Sep 11.

Utilizing daily mood diaries and wearable sensor data to predict depression and suicidal ideation among medical interns.利用日常情绪日记和可穿戴传感器数据预测实习医生的抑郁和自杀意念。

J Affect Disord. 2022 Sep 15;313:1-7. doi: 10.1016/j.jad.2022.06.064. Epub 2022 Jun 25.

Detection of Depression and Suicide Risk Based on Text From Clinical Interviews Using Machine Learning: Possibility of a New Objective Diagnostic Marker.基于临床访谈文本利用机器学习检测抑郁和自杀风险：新型客观诊断标志物的可能性

Front Psychiatry. 2022 May 24;13:801301. doi: 10.3389/fpsyt.2022.801301. eCollection 2022.

Digital phenotyping in depression diagnostics: Integrating psychiatric and engineering perspectives.抑郁症诊断中的数字表型分析：整合精神病学与工程学视角

World J Psychiatry. 2022 Mar 19;12(3):393-409. doi: 10.5498/wjp.v12.i3.393.

On evaluation metrics for medical applications of artificial intelligence.人工智能在医学应用中的评估指标。

Sci Rep. 2022 Apr 8;12(1):5979. doi: 10.1038/s41598-022-09954-8.

Sentiment Analysis in Social Media Data for Depression Detection Using Artificial Intelligence: A Review.利用人工智能进行抑郁症检测的社交媒体数据情感分析：综述

SN Comput Sci. 2022;3(1):74. doi: 10.1007/s42979-021-00958-1. Epub 2021 Nov 19.

Digital Biomarkers for Depression Screening With Wearable Devices: Cross-sectional Study With Machine Learning Modeling.基于可穿戴设备的抑郁筛查数字生物标志物：机器学习建模的横断面研究。

JMIR Mhealth Uhealth. 2021 Oct 25;9(10):e24872. doi: 10.2196/24872.

Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts.用于预测首次自杀未遂的电子健康记录的自然语言处理和机器学习

JAMIA Open. 2021 Mar 17;4(1):ooab011. doi: 10.1093/jamiaopen/ooab011. eCollection 2021 Jan.

Discrepancies between self-rated depression and observed depression severity: The effects of personality and dysfunctional attitudes.自评抑郁程度与观察到的抑郁严重程度之间的差异：人格和功能失调态度的影响。

Gen Hosp Psychiatry. 2021 May-Jun;70:25-30. doi: 10.1016/j.genhosppsych.2020.11.016. Epub 2020 Dec 2.

Bidirectional Representation Learning From Transformers Using Multimodal Electronic Health Record Data to Predict Depression.利用多模态电子健康记录数据从转换器中进行双向表示学习以预测抑郁。

IEEE J Biomed Health Inform. 2021 Aug;25(8):3121-3129. doi: 10.1109/JBHI.2021.3063721. Epub 2021 Aug 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用大型语言模型从用户生成的日记文本数据中检测抑郁，作为数字心理健康筛查的新方法：仪器验证研究。

Using Large Language Models to Detect Depression From User-Generated Diary Text Data as a Novel Approach in Digital Mental Health Screening: Instrument Validation Study.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献