Suppr超能文献

使用自然语言处理方法构建双相情感障碍患者在Reddit上的性欲亢进语料库:Reddit的信息流行病学研究

Using Natural Language Processing Methods to Build the Hypersexuality in Bipolar Reddit Corpus: Infodemiology Study of Reddit.

作者信息

Harvey Daisy, Rayson Paul, Lobban Fiona, Palmier-Claus Jasper, Dolman Clare, Chataigné Anne, Jones Steven

机构信息

Spectrum Centre for Mental Health Research, Division of Health Research, Lancaster University, Lancaster, United Kingdom.

School of Computing and Communications, Lancaster University, Lancaster, United Kingdom.

出版信息

JMIR Infodemiology. 2025 Mar 6;5:e65632. doi: 10.2196/65632.

Abstract

BACKGROUND

Bipolar is a severe mental health condition affecting at least 2% of the global population, with clinical observations suggesting that individuals experiencing elevated mood states, such as mania or hypomania, may have an increased propensity for engaging in risk-taking behaviors, including hypersexuality. Hypersexuality has historically been stigmatized in society and in health care provision, which makes it more difficult for service users to talk about their behaviors. There is a need for greater understanding of hypersexuality to develop better evidence-based treatment, support, and training for health professionals.

OBJECTIVE

This study aimed to develop and assess effective methodologies for identifying posts on Reddit related to hypersexuality posted by people with a self-reported bipolar diagnosis. Using natural language processing techniques, this research presents a specialized dataset, the Talking About Bipolar on Reddit Corpus (TABoRC). We used various computational tools to filter and categorize posts that mentioned hypersexuality, forming the Hypersexuality in Bipolar Reddit Corpus (HiB-RC). This paper introduces a novel methodology for detecting hypersexuality-related conversations on Reddit and offers both methodological insights and preliminary findings, laying the groundwork for further research in this emerging field.

METHODS

A toolbox of computational linguistic methods was used to create the corpora and infer demographic variables for the Redditors in the dataset. The key psychological domains in the corpus were measured using Linguistic Inquiry and Word Count, and a topic model was built using BERTopic to identify salient language clusters. This paper also discusses ethical considerations associated with this type of analysis.

RESULTS

The TABoRC is a corpus of 6,679,485 posts from 5177 Redditors, and the HiB-RC is a corpus totaling 2146 posts from 816 Redditors. The results demonstrate that, between 2012 and 2021, there was a 91.65% average yearly increase in posts in the HiB-RC (SD 119.6%) compared to 48.14% in the TABoRC (SD 51.2%) and an 86.97% average yearly increase in users (SD 93.8%) compared to 27.17% in the TABoRC (SD 38.7%). These statistics suggest that there was an increase in posting activity related to hypersexuality that exceeded the increase in general Reddit use over the same period. Several key psychological domains were identified as significant in the HiB-RC (P<.001), including more negative tone, more discussion of sex, and less discussion of wellness compared to the TABoRC. Finally, BERTopic was used to identify 9 key topics from the dataset.

CONCLUSIONS

Hypersexuality is an important symptom that is discussed by people with bipolar on Reddit and needs to be systematically recognized as a symptom of this illness. This research demonstrates the utility of a computational linguistic framework and offers a high-level overview of hypersexuality in bipolar, providing empirical evidence that paves the way for a deeper understanding of hypersexuality from a lived experience perspective.

摘要

背景

双相情感障碍是一种严重的心理健康状况,影响着全球至少2%的人口。临床观察表明,处于情绪高涨状态(如躁狂或轻躁狂)的个体可能更倾向于从事冒险行为,包括性欲亢进。性欲亢进在历史上一直受到社会和医疗保健领域的污名化,这使得服务使用者更难谈论自己的行为。为了开发更好的循证治疗方法、为健康专业人员提供支持和培训,有必要对性欲亢进有更深入的了解。

目的

本研究旨在开发和评估有效的方法,以识别Reddit上与自我报告患有双相情感障碍的人所发布的与性欲亢进相关的帖子。利用自然语言处理技术,本研究呈现了一个专门的数据集,即Reddit上谈论双相情感障碍语料库(TABoRC)。我们使用各种计算工具对提及性欲亢进的帖子进行筛选和分类,形成双相情感障碍Reddit语料库中的性欲亢进语料库(HiB-RC)。本文介绍了一种在Reddit上检测与性欲亢进相关对话的新方法,并提供了方法学见解和初步发现,为这一新兴领域的进一步研究奠定了基础。

方法

使用计算语言学方法工具箱来创建语料库,并推断数据集中Reddit用户的人口统计学变量。使用语言查询与字数统计工具测量语料库中的关键心理领域,并使用BERTopic构建主题模型以识别显著的语言簇。本文还讨论了与这类分析相关的伦理考量。

结果

TABoRC是一个包含来自5177名Reddit用户的6679485条帖子的语料库,HiB-RC是一个包含来自816名Reddit用户的共计2146条帖子的语料库。结果表明,在2012年至2021年期间,HiB-RC中的帖子平均每年增长91.65%(标准差119.6%),而TABoRC中的帖子平均每年增长48.14%(标准差51.2%);HiB-RC中的用户平均每年增长86.97%(标准差93.8%),而TABoRC中的用户平均每年增长27.17%(标准差38.7%)。这些统计数据表明,与性欲亢进相关的发帖活动有所增加,且超过了同期Reddit总体使用量的增长。与TABoRC相比,HiB-RC中确定了几个关键心理领域具有显著性(P<0.001),包括更多的负面语气(negative tone)、更多关于性的讨论以及更少关于健康的讨论。最后,使用BERTopic从数据集中识别出9个关键主题。

结论

性欲亢进是双相情感障碍患者在Reddit上讨论的一个重要症状,需要被系统地认定为这种疾病的一个症状。本研究证明了计算语言学框架的实用性,并提供了双相情感障碍中性欲亢进的高层次概述,提供了实证证据,为从生活体验角度更深入地理解性欲亢进铺平了道路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24b4/11926447/51851fe11f2e/infodemiology_v5i1e65632_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验