Suppr超能文献

从国际强迫症移动应用程序数据集看强迫观念的语义联系:大数据分析研究。

Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics Study.

机构信息

Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, United States.

Centre for Addiction and Mental Health, Department of Psychiatry, University of Toronto, Toronto, ON, Canada.

出版信息

J Med Internet Res. 2021 Jun 21;23(6):e25482. doi: 10.2196/25482.

Abstract

BACKGROUND

Obsessive-compulsive disorder (OCD) is characterized by recurrent intrusive thoughts, urges, or images (obsessions) and repetitive physical or mental behaviors (compulsions). Previous factor analytic and clustering studies suggest the presence of three or four subtypes of OCD symptoms. However, these studies have relied on predefined symptom checklists, which are limited in breadth and may be biased toward researchers' previous conceptualizations of OCD.

OBJECTIVE

In this study, we examine a large data set of freely reported obsession symptoms obtained from an OCD mobile app as an alternative to uncovering potential OCD subtypes. From this, we examine data-driven clusters of obsessions based on their latent semantic relationships in the English language using word embeddings.

METHODS

We extracted free-text entry words describing obsessions in a large sample of users of a mobile app, NOCD. Semantic vector space modeling was applied using the Global Vectors for Word Representation algorithm. A domain-specific extension, Mittens, was also applied to enhance the corpus with OCD-specific words. The resulting representations provided linear substructures of the word vector in a 100-dimensional space. We applied principal component analysis to the 100-dimensional vector representation of the most frequent words, followed by k-means clustering to obtain clusters of related words.

RESULTS

We obtained 7001 unique words representing obsessions from 25,369 individuals. Heuristics for determining the optimal number of clusters pointed to a three-cluster solution for grouping subtypes of OCD. The first had themes relating to relationship and just-right; the second had themes relating to doubt and checking; and the third had themes relating to contamination, somatic, physical harm, and sexual harm. All three clusters showed close semantic relationships with each other in the central area of convergence, with themes relating to harm. An equal-sized split-sample analysis across individuals and a split-sample analysis over time both showed overall stable cluster solutions. Words in the third cluster were the most frequently occurring words, followed by words in the first cluster.

CONCLUSIONS

The clustering of naturally acquired obsessional words resulted in three major groupings of semantic themes, which partially overlapped with predefined checklists from previous studies. Furthermore, the closeness of the overall embedded relationships across clusters and their central convergence on harm suggests that, at least at the level of self-reported obsessional thoughts, most obsessions have close semantic relationships. Harm to self or others may be an underlying organizing theme across many obsessions. Notably, relationship-themed words, not previously included in factor-analytic studies, clustered with just-right words. These novel insights have potential implications for understanding how an apparent multitude of obsessional symptoms are connected by underlying themes. This observation could aid exposure-based treatment approaches and could be used as a conceptual framework for future research.

摘要

背景

强迫症(OCD)的特征是反复出现的侵入性思维、冲动或图像(强迫观念)和重复的身体或精神行为(强迫行为)。先前的因素分析和聚类研究表明,存在强迫症症状的三个或四个亚型。然而,这些研究依赖于预先定义的症状检查表,这些检查表在广度上有限,并且可能偏向于研究人员以前对 OCD 的概念化。

目的

在这项研究中,我们从 OCD 移动应用程序中获得的大量自由报告的强迫症状数据集中检查,以发现潜在的 OCD 亚型。在此基础上,我们使用词嵌入技术,根据英语中潜在的语义关系,对强迫观念进行基于数据的聚类。

方法

我们从一个移动应用程序,即 NOCD 的大量用户中提取描述强迫观念的自由文本条目。使用全局词向量表示算法应用语义向量空间建模。还应用了特定领域的扩展 Mittens,用 OCD 特定的词来增强语料库。由此产生的表示提供了在 100 维空间中词向量的线性子结构。我们对最常用词的 100 维向量表示应用主成分分析,然后应用 k-均值聚类获得相关词的聚类。

结果

我们从 25369 个人中获得了 7001 个独特的代表强迫观念的单词。确定最佳聚类数目的启发式方法指出,将 OCD 亚型分组为三个聚类是最佳选择。第一个聚类的主题与人际关系和恰到好处有关;第二个聚类的主题与怀疑和检查有关;第三个聚类的主题与污染、躯体、身体伤害和性伤害有关。所有三个聚类在收敛的中心区域都具有密切的语义关系,主题与伤害有关。在个体之间进行等分样本分析和随时间进行等分样本分析都显示出总体稳定的聚类解决方案。第三聚类中的单词是最常出现的单词,其次是第一聚类中的单词。

结论

自然获得的强迫性词语聚类产生了三个主要的语义主题分组,这与以前研究中预定义的检查表部分重叠。此外,聚类之间的整体嵌入关系的紧密程度及其在伤害方面的集中收敛表明,至少在自我报告的强迫性思维层面上,大多数强迫观念具有密切的语义关系。对自己或他人的伤害可能是许多强迫观念的一个潜在组织主题。值得注意的是,以前没有纳入因素分析研究的人际关系主题词与恰到好处的词聚类在一起。这些新的见解可能对理解看似多种多样的强迫症状如何通过潜在主题联系起来具有潜在意义。这种观察结果可以帮助基于暴露的治疗方法,并可用作未来研究的概念框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af57/8277342/7e4f6623ce24/jmir_v23i6e25482_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验