Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, USA.
College of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
J Am Med Inform Assoc. 2020 Feb 1;27(2):315-329. doi: 10.1093/jamia/ocz162.
Prescription medication (PM) misuse and abuse is a major health problem globally, and a number of recent studies have focused on exploring social media as a resource for monitoring nonmedical PM use. Our objectives are to present a methodological review of social media-based PM abuse or misuse monitoring studies, and to propose a potential generalizable, data-centric processing pipeline for the curation of data from this resource.
We identified studies involving social media, PMs, and misuse or abuse (inclusion criteria) from Medline, Embase, Scopus, Web of Science, and Google Scholar. We categorized studies based on multiple characteristics including but not limited to data size; social media source(s); medications studied; and primary objectives, methods, and findings.
A total of 39 studies met our inclusion criteria, with 31 (∼79.5%) published since 2015. Twitter has been the most popular resource, with Reddit and Instagram gaining popularity recently. Early studies focused mostly on manual, qualitative analyses, with a growing trend toward the use of data-centric methods involving natural language processing and machine learning.
There is a paucity of standardized, data-centric frameworks for curating social media data for task-specific analyses and near real-time surveillance of nonmedical PM use. Many existing studies do not quantify human agreements for manual annotation tasks or take into account the presence of noise in data.
The development of reproducible and standardized data-centric frameworks that build on the current state-of-the-art methods in data and text mining may enable effective utilization of social media data for understanding and monitoring nonmedical PM use.
处方药物(PM)的滥用是一个全球性的主要健康问题,最近有许多研究都集中在探索社交媒体作为监测非医疗 PM 使用的资源。我们的目标是对基于社交媒体的 PM 滥用或误用监测研究进行方法学综述,并提出一个潜在的可推广的数据中心处理管道,用于从该资源中整理数据。
我们从 Medline、Embase、Scopus、Web of Science 和 Google Scholar 中确定了涉及社交媒体、PM 和滥用或误用的研究(纳入标准)。我们根据多种特征对研究进行分类,包括但不限于数据大小、社交媒体来源、研究的药物以及主要目标、方法和发现。
共有 39 项研究符合我们的纳入标准,其中 31 项(约 79.5%)发表于 2015 年之后。Twitter 是最受欢迎的资源,Reddit 和 Instagram 最近也越来越受欢迎。早期的研究主要集中在手动、定性分析上,现在越来越倾向于使用涉及自然语言处理和机器学习的数据中心方法。
针对特定任务的分析和非医疗 PM 使用的近实时监测,缺乏用于整理社交媒体数据的标准化、数据中心框架。许多现有的研究没有量化手动注释任务中的人工一致性,也没有考虑到数据中的噪声。
开发可重现和标准化的数据中心框架,这些框架建立在数据和文本挖掘的当前最新方法之上,可能会使社交媒体数据能够有效地用于理解和监测非医疗 PM 使用。