Pharmacovigilance, Bayer AG, Müllerstr. 170, 13353, Berlin, Germany.
Uppsala Monitoring Centre, Uppsala, Sweden.
Drug Saf. 2020 May;43(5):467-478. doi: 10.1007/s40264-020-00912-9.
INTRODUCTION AND OBJECTIVE: Social media has been suggested as a source for safety information, supplementing existing safety surveillance data sources. This article summarises the activities undertaken, and the associated challenges, to create a benchmark reference dataset that can be used to evaluate the performance of automated methods and systems for adverse event recognition.
A retrospective analysis of public English-language Twitter posts (Tweets) was performed. We sampled 57,473 Tweets out of 5,645,336 Tweets created between 1 March, 2012 and 1 March, 2015 that mentioned at least one of six medicinal products of interest (insulin glargine, levetiracetam, methylphenidate, sorafenib, terbinafine, zolpidem). Products, adverse events, indications, product-event combinations, and product-indication combinations were extracted and coded by two independent teams of safety reviewers.
The benchmark reference dataset consisted of 1056 positive controls ("adverse event Tweets") and 56,417 negative controls ("non-adverse event Tweets"). The 1056 adverse event Tweets contained 1396 product-event combinations referring to personal adverse event experiences, comprising 292 different MedDRA Preferred Terms. The 1171 product-event combinations (83.9%) were confined to four MedDRA System Organ Classes. The 195 Tweets (18.5%) contained indication information, comprising 25 different Preferred Terms.
A manually curated benchmark reference dataset based on Twitter data has been created and is made available to the research community to evaluate the performance of automated methods and systems for adverse event recognition in unstructured free-text information.
介绍和目的:社交媒体已被提议作为安全信息的来源,补充现有的安全监测数据源。本文总结了创建基准参考数据集所开展的活动及相关挑战,该数据集可用于评估用于识别不良事件的自动化方法和系统的性能。
对公共英语语言 Twitter 帖子(推文)进行了回顾性分析。我们从 2012 年 3 月 1 日至 2015 年 3 月 1 日期间创建的 5645336 条推文中,抽取了 57473 条推文,这些推文至少提到了六种感兴趣的药物产品(甘精胰岛素、左乙拉西坦、哌甲酯、索拉非尼、特比萘芬、唑吡坦)中的一种。产品、不良事件、适应症、产品-事件组合以及产品-适应症组合由两组独立的安全审查员进行提取和编码。
基准参考数据集包括 1056 个阳性对照(“不良事件推文”)和 56417 个阴性对照(“非不良事件推文”)。1056 条不良事件推文包含 1396 个涉及个人不良事件经历的产品-事件组合,包含 292 个不同的 MedDRA 首选术语。1171 个产品-事件组合(83.9%)局限于四个 MedDRA 系统器官类别。195 条推文(18.5%)包含适应症信息,包含 25 个不同的首选术语。
基于 Twitter 数据创建了一个经过人工策展的基准参考数据集,并提供给研究界,以评估用于识别非结构化自由文本信息中不良事件的自动化方法和系统的性能。