文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

理解不同社交网络服务领域的心理健康问题:基于文本的 Reddit 帖子的计算分析。

Understanding Mental Health Issues in Different Subdomains of Social Networking Services: Computational Analysis of Text-Based Reddit Posts.

机构信息

Department of Applied Artificial Intelligence, Sungkyunkwan University, Seoul, Republic of Korea.

Teach Company, Seoul, Republic of Korea.

出版信息

J Med Internet Res. 2023 Nov 30;25:e49074. doi: 10.2196/49074.


DOI:10.2196/49074
PMID:38032730
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10722371/
Abstract

BACKGROUND: Users increasingly use social networking services (SNSs) to share their feelings and emotions. For those with mental disorders, SNSs can also be used to seek advice on mental health issues. One available SNS is Reddit, in which users can freely discuss such matters on relevant health diagnostic subreddits. OBJECTIVE: In this study, we analyzed the distinctive linguistic characteristics in users' posts on specific mental disorder subreddits (depression, anxiety, bipolar disorder, borderline personality disorder, schizophrenia, autism, and mental health) and further validated their distinctiveness externally by comparing them with posts of subreddits not related to mental illness. We also confirmed that these differences in linguistic formulations can be learned through a machine learning process. METHODS: Reddit posts uploaded by users were collected for our research. We used various statistical analysis methods in Linguistic Inquiry and Word Count (LIWC) software, including 1-way ANOVA and subsequent post hoc tests, to see sentiment differences in various lexical features within mental health-related subreddits and against unrelated ones. We also applied 3 supervised and unsupervised clustering methods for both cases after extracting textual features from posts on each subreddit using bidirectional encoder representations from transformers (BERT) to ensure that our data set is suitable for further machine learning or deep learning tasks. RESULTS: We collected 3,133,509 posts of 919,722 Reddit users. The results using the data indicated that there are notable linguistic differences among the subreddits, consistent with the findings of prior research. The findings from LIWC analyses revealed that patients with each mental health issue show significantly different lexical and semantic patterns, such as word count or emotion, throughout their online social networking activities, with P<.001 for all cases. Furthermore, distinctive features of each subreddit group were successfully identified through supervised and unsupervised clustering methods, using the BERT embeddings extracted from textual posts. This distinctiveness was reflected in the Davies-Bouldin scores ranging from 0.222 to 0.397 and the silhouette scores ranging from 0.639 to 0.803 in the former case, with scores of 1.638 and 0.729, respectively, in the latter case. CONCLUSIONS: By taking a multifaceted approach, analyzing textual posts related to mental health issues using statistical, natural language processing, and machine learning techniques, our approach provides insights into aspects of recent lexical usage and information about the linguistic characteristics of patients with specific mental health issues, which can inform clinicians about patients' mental health in diagnostic terms to aid online intervention. Our findings can further promote research areas involving linguistic analysis and machine learning approaches for patients with mental health issues by identifying and detecting mentally vulnerable groups of people online.

摘要

背景:用户越来越多地使用社交网络服务(SNS)来分享他们的感受和情绪。对于那些患有精神障碍的人来说,SNS 也可以用来寻求有关心理健康问题的建议。一个可用的 SNS 是 Reddit,用户可以在相关的健康诊断子版块上自由讨论此类问题。

目的:在这项研究中,我们分析了用户在特定精神障碍子版块(抑郁、焦虑、双相情感障碍、边缘型人格障碍、精神分裂症、自闭症和心理健康)上发布的帖子中的独特语言特征,并通过将其与非精神疾病相关的子版块的帖子进行比较,从外部验证其独特性。我们还证实,这些语言表达方式的差异可以通过机器学习过程来学习。

方法:我们收集了用户上传的 Reddit 帖子。我们使用了 Linguistic Inquiry and Word Count(LIWC)软件中的各种统计分析方法,包括单向方差分析和随后的事后检验,以观察心理健康相关子版块内和与非相关子版块内各种词汇特征的情绪差异。我们还应用了 3 种监督和无监督聚类方法,对每个子版块的帖子提取文本特征后,使用来自变压器的双向编码器表示(BERT),以确保我们的数据适合进一步的机器学习或深度学习任务。

结果:我们共收集了 3133509 篇来自 919722 名 Reddit 用户的帖子。数据结果表明,子版块之间存在显著的语言差异,这与先前的研究结果一致。LIWC 分析的结果表明,每个心理健康问题的患者在其在线社交网络活动中表现出明显不同的词汇和语义模式,例如词汇量或情绪,所有情况下 P<.001。此外,通过使用从文本帖子中提取的 BERT 嵌入,使用监督和无监督聚类方法成功识别了每个子版块组的独特特征。这种独特性反映在 Davies-Bouldin 分数在 0.222 到 0.397 之间,轮廓分数在 0.639 到 0.803 之间,在后者的情况下,分别为 1.638 和 0.729。

结论:通过采用多方面的方法,使用统计、自然语言处理和机器学习技术分析与心理健康问题相关的文本帖子,我们的方法提供了有关最近词汇用法的见解,并提供了有关特定心理健康问题患者语言特征的信息,这可以为临床医生提供诊断术语方面的患者心理健康信息,以帮助在线干预。我们的发现可以通过识别和检测在线上易受精神伤害的人群,进一步促进涉及精神健康问题患者的语言分析和机器学习方法的研究领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/2dd5a933af31/jmir_v25i1e49074_fig15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/2dfc77c1f1ee/jmir_v25i1e49074_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/17ada5967aae/jmir_v25i1e49074_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/13f13a31cf8f/jmir_v25i1e49074_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/1f260cb02848/jmir_v25i1e49074_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/10b50bdfbbc9/jmir_v25i1e49074_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/3a6d5916bfad/jmir_v25i1e49074_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/0348e840632c/jmir_v25i1e49074_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/8cac74b49f5c/jmir_v25i1e49074_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/db8998d8011a/jmir_v25i1e49074_fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/3f560b29d645/jmir_v25i1e49074_fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/e3dfe5a1f5e8/jmir_v25i1e49074_fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/0234b32cc8d8/jmir_v25i1e49074_fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/caf316d9da6a/jmir_v25i1e49074_fig13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/e8d1a01c1814/jmir_v25i1e49074_fig14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/2dd5a933af31/jmir_v25i1e49074_fig15.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/2dfc77c1f1ee/jmir_v25i1e49074_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/17ada5967aae/jmir_v25i1e49074_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/13f13a31cf8f/jmir_v25i1e49074_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/1f260cb02848/jmir_v25i1e49074_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/10b50bdfbbc9/jmir_v25i1e49074_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/3a6d5916bfad/jmir_v25i1e49074_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/0348e840632c/jmir_v25i1e49074_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/8cac74b49f5c/jmir_v25i1e49074_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/db8998d8011a/jmir_v25i1e49074_fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/3f560b29d645/jmir_v25i1e49074_fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/e3dfe5a1f5e8/jmir_v25i1e49074_fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/0234b32cc8d8/jmir_v25i1e49074_fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/caf316d9da6a/jmir_v25i1e49074_fig13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/e8d1a01c1814/jmir_v25i1e49074_fig14.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6847/10722371/2dd5a933af31/jmir_v25i1e49074_fig15.jpg

相似文献

[1]
Understanding Mental Health Issues in Different Subdomains of Social Networking Services: Computational Analysis of Text-Based Reddit Posts.

J Med Internet Res. 2023-11-30

[2]
Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study.

J Med Internet Res. 2020-10-12

[3]
Posting patterns in peer online support forums and their associations with emotions and mood in bipolar disorder: Exploratory analysis.

PLoS One. 2023

[4]
Schizophrenia Detection Using Machine Learning Approach from Social Media Content.

Sensors (Basel). 2021-9-3

[5]
User Dynamics and Thematic Exploration in r/Depression During the COVID-19 Pandemic: Insights From Overlapping r/SuicideWatch Users.

J Med Internet Res. 2024-5-20

[6]
Using Large Language Models to Understand Suicidality in a Social Media-Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts.

JMIR Ment Health. 2024-5-16

[7]
Predicting Age Groups of Reddit Users Based on Posting Behavior and Metadata: Classification Model Development and Validation.

JMIR Public Health Surveill. 2021-3-16

[8]
Exploring Language Used in Posts on r/birthcontrol: Case Study Using Data From Reddit Posts and Natural Language Processing to Advance Contraception Research.

J Med Internet Res. 2023-6-30

[9]
Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches.

J Supercomput. 2023-5-7

[10]
Models of Gender Dysphoria Using Social Media Data for Use in Technology-Delivered Interventions: Machine Learning and Natural Language Processing Validation Study.

JMIR Form Res. 2023-6-16

引用本文的文献

[1]
Year 2023 in Biomedical Natural Language Processing: a Tribute to Large Language Models and Generative AI.

Yearb Med Inform. 2024-8

[2]
Endometriosis Communities on Reddit: Quantitative Analysis.

J Med Internet Res. 2025-3-31

[3]
User Dynamics and Thematic Exploration in r/Depression During the COVID-19 Pandemic: Insights From Overlapping r/SuicideWatch Users.

J Med Internet Res. 2024-5-20

本文引用的文献

[1]
A lexicon-based approach to examine depression detection in social media: the case of Twitter and university community.

Humanit Soc Sci Commun. 2022

[2]
An overview of artificial intelligence techniques for diagnosis of Schizophrenia based on magnetic resonance imaging modalities: Methods, challenges, and future works.

Comput Biol Med. 2022-7

[3]
Schizophrenia Detection Using Machine Learning Approach from Social Media Content.

Sensors (Basel). 2021-9-3

[4]
Bipolar disorder and frontotemporal dementia: A systematic review.

Acta Psychiatr Scand. 2021-11

[5]
Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study.

J Med Internet Res. 2020-10-12

[6]
A deep learning model for detecting mental illness from user content on social media.

Sci Rep. 2020-7-16

[7]
Twitter usage about autism spectrum disorder.

Autism. 2020-10

[8]
"We Can See a Bright Future": Parents' Perceptions of the Outcomes of Participating in a Strengths-Based Program for Adolescents with Autism Spectrum Disorder.

J Autism Dev Disord. 2020-9

[9]
Harnessing Reddit to Understand the Written-Communication Challenges Experienced by Individuals With Mental Health Disorders: Analysis of Texts From Mental Health Communities.

J Med Internet Res. 2018-4-10

[10]
Characterisation of mental health conditions in social media using Informed Deep Learning.

Sci Rep. 2017-3-22

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索