Scientific Computing Program, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil.
San Diego State University, School of Social Work, San Diego, CA, United States.
JMIR Infodemiology. 2024 Sep 13;4:e51156. doi: 10.2196/51156.
The growing availability of big data spontaneously generated by social media platforms allows us to leverage natural language processing (NLP) methods as valuable tools to understand the opioid crisis.
We aimed to understand how NLP has been applied to Reddit (Reddit Inc) data to study opioid use.
We systematically searched for peer-reviewed studies and conference abstracts in PubMed, Scopus, PsycINFO, ACL Anthology, IEEE Xplore, and Association for Computing Machinery data repositories up to July 19, 2022. Inclusion criteria were studies investigating opioid use, using NLP techniques to analyze the textual corpora, and using Reddit as the social media data source. We were specifically interested in mapping studies' overarching goals and findings, methodologies and software used, and main limitations.
In total, 30 studies were included, which were classified into 4 nonmutually exclusive overarching goal categories: methodological (n=6, 20% studies), infodemiology (n=22, 73% studies), infoveillance (n=7, 23% studies), and pharmacovigilance (n=3, 10% studies). NLP methods were used to identify content relevant to opioid use among vast quantities of textual data, to establish potential relationships between opioid use patterns or profiles and contextual factors or comorbidities, and to anticipate individuals' transitions between different opioid-related subreddits, likely revealing progression through opioid use stages. Most studies used an embedding technique (12/30, 40%), prediction or classification approach (12/30, 40%), topic modeling (9/30, 30%), and sentiment analysis (6/30, 20%). The most frequently used programming languages were Python (20/30, 67%) and R (2/30, 7%). Among the studies that reported limitations (20/30, 67%), the most cited was the uncertainty regarding whether redditors participating in these forums were representative of people who use opioids (8/20, 40%). The papers were very recent (28/30, 93%), from 2019 to 2022, with authors from a range of disciplines.
This scoping review identified a wide variety of NLP techniques and applications used to support surveillance and social media interventions addressing the opioid crisis. Despite the clear potential of these methods to enable the identification of opioid-relevant content in Reddit and its analysis, there are limits to the degree of interpretive meaning that they can provide. Moreover, we identified the need for standardized ethical guidelines to govern the use of Reddit data to safeguard the anonymity and privacy of people using these forums.
社交媒体平台上自发产生的大数据越来越多,这使得我们能够利用自然语言处理 (NLP) 方法作为理解阿片类药物危机的有价值的工具。
我们旨在了解 NLP 如何应用于 Reddit(Reddit Inc)数据来研究阿片类药物的使用情况。
我们系统地在 PubMed、Scopus、PsycINFO、ACL 文集、IEEE Xplore 和计算机协会数据存储库中搜索了截至 2022 年 7 月 19 日的同行评审研究和会议摘要。纳入标准为研究阿片类药物使用情况,使用 NLP 技术分析文本语料库,以及使用 Reddit 作为社交媒体数据源的研究。我们特别关注映射研究的总体目标和发现、使用的方法和软件以及主要限制。
共纳入 30 项研究,分为 4 个非互斥的总体目标类别:方法学(n=6,20%的研究)、信息流行病学(n=22,73%的研究)、信息监测(n=7,23%的研究)和药物警戒(n=3,10%的研究)。NLP 方法用于从大量文本数据中识别与阿片类药物使用相关的内容,建立阿片类药物使用模式或特征与上下文因素或合并症之间的潜在关系,并预测个体在不同阿片类相关子论坛之间的转变,可能揭示阿片类药物使用阶段的进展。大多数研究使用了嵌入技术(12/30,40%)、预测或分类方法(12/30,40%)、主题建模(9/30,30%)和情感分析(6/30,20%)。使用最频繁的编程语言是 Python(20/30,67%)和 R(2/30,7%)。在报告了局限性的研究中(20/30,67%),最常被引用的是不确定参与这些论坛的 Redditors 是否代表使用阿片类药物的人(8/20,40%)。这些论文都非常新(28/30,93%),发表于 2019 年至 2022 年,作者来自多个学科。
本范围界定综述确定了广泛的 NLP 技术和应用,用于支持针对阿片类药物危机的监测和社交媒体干预。尽管这些方法具有识别 Reddit 中与阿片类药物相关内容并对其进行分析的明显潜力,但它们提供的解释意义有限。此外,我们发现需要制定标准化的道德准则来规范 Reddit 数据的使用,以保护使用这些论坛的人的匿名性和隐私。