Department of Psychiatry, Anam Hospital, Korea University, Seoul, Republic of Korea.
Doctorpresso, Seoul, Republic of Korea.
J Med Internet Res. 2024 Sep 18;26:e54617. doi: 10.2196/54617.
BACKGROUND: Depressive disorders have substantial global implications, leading to various social consequences, including decreased occupational productivity and a high disability burden. Early detection and intervention for clinically significant depression have gained attention; however, the existing depression screening tools, such as the Center for Epidemiologic Studies Depression Scale, have limitations in objectivity and accuracy. Therefore, researchers are identifying objective indicators of depression, including image analysis, blood biomarkers, and ecological momentary assessments (EMAs). Among EMAs, user-generated text data, particularly from diary writing, have emerged as a clinically significant and analyzable source for detecting or diagnosing depression, leveraging advancements in large language models such as ChatGPT. OBJECTIVE: We aimed to detect depression based on user-generated diary text through an emotional diary writing app using a large language model (LLM). We aimed to validate the value of the semistructured diary text data as an EMA data source. METHODS: Participants were assessed for depression using the Patient Health Questionnaire and suicide risk was evaluated using the Beck Scale for Suicide Ideation before starting and after completing the 2-week diary writing period. The text data from the daily diaries were also used in the analysis. The performance of leading LLMs, such as ChatGPT with GPT-3.5 and GPT-4, was assessed with and without GPT-3.5 fine-tuning on the training data set. The model performance comparison involved the use of chain-of-thought and zero-shot prompting to analyze the text structure and content. RESULTS: We used 428 diaries from 91 participants; GPT-3.5 fine-tuning demonstrated superior performance in depression detection, achieving an accuracy of 0.902 and a specificity of 0.955. However, the balanced accuracy was the highest (0.844) for GPT-3.5 without fine-tuning and prompt techniques; it displayed a recall of 0.929. CONCLUSIONS: Both GPT-3.5 and GPT-4.0 demonstrated relatively reasonable performance in recognizing the risk of depression based on diaries. Our findings highlight the potential clinical usefulness of user-generated text data for detecting depression. In addition to measurable indicators, such as step count and physical activity, future research should increasingly emphasize qualitative digital expression.
背景:抑郁障碍在全球范围内具有重要意义,导致各种社会后果,包括职业生产力下降和高残疾负担。对于有临床意义的抑郁,早期发现和干预已经引起了关注;然而,现有的抑郁筛查工具,如流行病学研究中心抑郁量表,在客观性和准确性方面存在局限性。因此,研究人员正在寻找抑郁的客观指标,包括图像分析、血液生物标志物和生态瞬时评估(EMA)。在 EMAs 中,用户生成的文本数据,特别是来自日记写作的文本数据,已经成为一种具有临床意义和可分析的检测或诊断抑郁的来源,利用了 ChatGPT 等大型语言模型的进步。
目的:我们旨在通过使用大型语言模型(LLM)的情感日记写作应用程序,基于用户生成的日记文本检测抑郁。我们旨在验证半结构化日记文本数据作为 EMA 数据源的价值。
方法:参与者在开始和完成为期 2 周的日记写作期后,使用患者健康问卷(Patient Health Questionnaire)评估抑郁,使用贝克自杀意念量表(Beck Scale for Suicide Ideation)评估自杀风险。日常日记中的文本数据也用于分析。评估了领先的 LLM(如 ChatGPT 与 GPT-3.5 和 GPT-4)的性能,包括在训练数据集上进行 GPT-3.5 微调前后的性能。模型性能比较涉及使用思维链和零样本提示来分析文本结构和内容。
结果:我们使用了 91 名参与者的 428 篇日记;GPT-3.5 微调在抑郁检测方面表现出优异的性能,准确率为 0.902,特异性为 0.955。然而,GPT-3.5 未进行微调和提示技术的平衡准确性最高(0.844),其召回率为 0.929。
结论:GPT-3.5 和 GPT-4.0 都在基于日记识别抑郁风险方面表现出相对合理的性能。我们的发现强调了用户生成的文本数据用于检测抑郁的潜在临床价值。除了可衡量的指标,如步数和身体活动,未来的研究应该越来越强调定性的数字表达。
J Med Internet Res. 2025-6-23
J Med Internet Res. 2024-12-11
Cochrane Database Syst Rev. 2015-7-27
J Med Internet Res. 2025-7-31
Alzheimers Dement. 2021-4
Cochrane Database Syst Rev. 2016-7-1
JMIR Ment Health. 2025-6-27
J Med Internet Res. 2025-6-9
J Med Internet Res. 2025-5-5
Psychiatry Clin Neurosci. 2023-11
World J Psychiatry. 2022-3-19
JMIR Mhealth Uhealth. 2021-10-25
IEEE J Biomed Health Inform. 2021-8