Medical Informatics Division, Case Western Reserve, Cleveland, Ohio, USA.
BMC Bioinformatics. 2014 Jan 15;15:17. doi: 10.1186/1471-2105-15-17.
Independent data sources can be used to augment post-marketing drug safety signal detection. The vast amount of publicly available biomedical literature contains rich side effect information for drugs at all clinical stages. In this study, we present a large-scale signal boosting approach that combines over 4 million records in the US Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) and over 21 million biomedical articles.
The datasets are comprised of 4,285,097 records from FAERS and 21,354,075 MEDLINE articles. We first extracted all drug-side effect (SE) pairs from FAERS. Our study implemented a total of seven signal ranking algorithms. We then compared these different ranking algorithms before and after they were boosted with signals from MEDLINE sentences or abstracts. Finally, we manually curated all drug-cardiovascular (CV) pairs that appeared in both data sources and investigated whether our approach can detect many true signals that have not been included in FDA drug labels. We extracted a total of 2,787,797 drug-SE pairs from FAERS with a low initial precision of 0.025. The ranking algorithm combined signals from both FAERS and MEDLINE, significantly improving the precision from 0.025 to 0.371 for top-ranked pairs, representing a 13.8 fold elevation in precision. We showed by manual curation that drug-SE pairs that appeared in both data sources were highly enriched with true signals, many of which have not yet been included in FDA drug labels.
We have developed an efficient and effective drug safety signal ranking and strengthening approach We demonstrate that large-scale combining information from FAERS and biomedical literature can significantly contribute to drug safety surveillance.
独立数据源可用于增强上市后药物安全性信号检测。大量公开的生物医学文献包含了处于所有临床阶段的药物的丰富副作用信息。在这项研究中,我们提出了一种大规模信号增强方法,该方法结合了美国食品和药物管理局(FDA)不良事件报告系统(FAERS)中的 400 多万条记录和超过 2100 万篇生物医学文章。
数据集包括来自 FAERS 的 4285097 条记录和 21354075 条 MEDLINE 文章。我们首先从 FAERS 中提取所有药物-副作用(SE)对。我们的研究共实施了七种信号排名算法。然后,我们比较了这些不同的排名算法,这些算法在经过 MEDLINE 句子或摘要信号增强前后的效果。最后,我们对同时出现在两个数据源中的所有药物-心血管(CV)对进行了手工策管,并调查了我们的方法是否可以检测到许多尚未包含在 FDA 药物标签中的真实信号。我们从 FAERS 中总共提取了 2787797 对药物-SE 对,初始精度为 0.025。排名算法结合了 FAERS 和 MEDLINE 的信号,将排名靠前的对的精度从 0.025 显著提高到 0.371,提高了 13.8 倍。通过手工策管,我们表明同时出现在两个数据源中的药物-SE 对高度富集了真实信号,其中许多信号尚未包含在 FDA 药物标签中。
我们已经开发了一种高效有效的药物安全性信号排名和增强方法。我们证明了从 FAERS 和生物医学文献中大规模结合信息可以显著促进药物安全性监测。