利用大语言模型理解社交媒体为基础的精神障碍分类学中的自杀倾向:对 Reddit 帖子的语言分析。

Using Large Language Models to Understand Suicidality in a Social Media-Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts.

机构信息

Department of Psychology, University of Georgia, Athens, GA, United States.

Digital Health, IBM Research, New York, NY, United States.

出版信息

JMIR Ment Health. 2024 May 16;11:e57234. doi: 10.2196/57234.

Abstract

BACKGROUND

Rates of suicide have increased by over 35% since 1999. Despite concerted efforts, our ability to predict, explain, or treat suicide risk has not significantly improved over the past 50 years.

OBJECTIVE

The aim of this study was to use large language models to understand natural language use during public web-based discussions (on Reddit) around topics related to suicidality.

METHODS

We used large language model-based sentence embedding to extract the latent linguistic dimensions of user postings derived from several mental health-related subreddits, with a focus on suicidality. We then applied dimensionality reduction to these sentence embeddings, allowing them to be summarized and visualized in a lower-dimensional Euclidean space for further downstream analyses. We analyzed 2.9 million posts extracted from 30 subreddits, including r/SuicideWatch, between October 1 and December 31, 2022, and the same period in 2010.

RESULTS

Our results showed that, in line with existing theories of suicide, posters in the suicidality community (r/SuicideWatch) predominantly wrote about feelings of disconnection, burdensomeness, hopeless, desperation, resignation, and trauma. Further, we identified distinct latent linguistic dimensions (well-being, seeking support, and severity of distress) among all mental health subreddits, and many of the resulting subreddit clusters were in line with a statistically driven diagnostic classification system-namely, the Hierarchical Taxonomy of Psychopathology (HiTOP)-by mapping onto the proposed superspectra.

CONCLUSIONS

Overall, our findings provide data-driven support for several language-based theories of suicide, as well as dimensional classification systems for mental health disorders. Ultimately, this novel combination of natural language processing techniques can assist researchers in gaining deeper insights about emotions and experiences shared on the web and may aid in the validation and refutation of different mental health theories.

摘要

背景

自 1999 年以来,自杀率上升了 35%以上。尽管我们做出了协同努力,但在过去的 50 年中,我们预测、解释或治疗自杀风险的能力并没有显著提高。

目的

本研究旨在使用大型语言模型来理解与自杀相关话题的公共网络讨论(在 Reddit 上)中自然语言的使用。

方法

我们使用基于大型语言模型的句子嵌入来提取来自几个心理健康相关子版块的用户帖子中的潜在语言维度,重点关注自杀。然后,我们对这些句子嵌入应用降维处理,允许它们在低维欧几里得空间中进行总结和可视化,以便进一步进行下游分析。我们分析了 2022 年 10 月 1 日至 12 月 31 日期间从 30 个子版块中提取的 290 万条帖子,包括 r/SuicideWatch,以及 2010 年同期的相同数据。

结果

我们的研究结果表明,与现有的自杀理论一致,自杀社区(r/SuicideWatch)的帖子主要描述了与脱节、负担感、绝望、绝望、顺从和创伤有关的感觉。此外,我们在所有心理健康子版块中都发现了明显的潜在语言维度(幸福感、寻求支持和痛苦严重程度),并且许多由此产生的子版块聚类与基于统计学驱动的诊断分类系统一致——即分层精神病理学分类(HiTOP)——通过映射到拟议的超谱上。

结论

总体而言,我们的研究结果为基于语言的几种自杀理论以及心理健康障碍的维度分类系统提供了数据支持。最终,这种自然语言处理技术的新颖组合可以帮助研究人员更深入地了解网络上共享的情绪和体验,并且可能有助于不同心理健康理论的验证和反驳。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/62a7/11112053/f53ee857a188/mental-v11-e57234-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索