Winnenburg Rainer, Shah Nigam H
Stanford Center for Biomedical Informatics Research, 1265 Welch Road, MSOB, Stanford, CA, 94305, USA.
BMC Bioinformatics. 2016 Jun 23;17:250. doi: 10.1186/s12859-016-1080-z.
Identification of associations between marketed drugs and adverse events from the biomedical literature assists drug safety monitoring efforts. Assessing the significance of such literature-derived associations and determining the granularity at which they should be captured remains a challenge. Here, we assess how defining a selection of adverse event terms from MeSH, based on information content, can improve the detection of adverse events for drugs and drug classes.
We analyze a set of 105,354 candidate drug adverse event pairs extracted from article indexes in MEDLINE. First, we harmonize extracted adverse event terms by aggregating them into higher-level MeSH terms based on the terms' information content. Then, we determine statistical enrichment of adverse events associated with drug and drug classes using a conditional hypergeometric test that adjusts for dependencies among associated terms. We compare our results with methods based on disproportionality analysis (proportional reporting ratio, PRR) and quantify the improvement in signal detection with our generalized enrichment analysis (GEA) approach using a gold standard of drug-adverse event associations spanning 174 drugs and four events. For single drugs, the best GEA method (Precision: .92/Recall: .71/F1-measure: .80) outperforms the best PRR based method (.69/.69/.69) on all four adverse event outcomes in our gold standard. For drug classes, our GEA performs similarly (.85/.69/.74) when increasing the level of abstraction for adverse event terms. Finally, on examining the 1609 individual drugs in our MEDLINE set, which map to chemical substances in ATC, we find signals for 1379 drugs (10,122 unique adverse event associations) on applying GEA with p < 0.005.
We present an approach based on generalized enrichment analysis that can be used to detect associations between drugs, drug classes and adverse events at a given level of granularity, at the same time correcting for known dependencies among events. Our study demonstrates the use of GEA, and the importance of choosing appropriate abstraction levels to complement current drug safety methods. We provide an R package for exploration of alternative abstraction levels of adverse event terms based on information content.
从生物医学文献中识别已上市药物与不良事件之间的关联有助于药物安全监测工作。评估此类文献衍生关联的显著性并确定应捕捉它们的粒度仍然是一项挑战。在此,我们评估基于信息内容从医学主题词表(MeSH)中定义一组不良事件术语如何能够改善对药物和药物类别的不良事件的检测。
我们分析了从MEDLINE文章索引中提取的一组105354个候选药物不良事件对。首先,我们通过基于术语的信息内容将提取的不良事件术语汇总为更高级别的MeSH术语来统一这些术语。然后,我们使用条件超几何检验确定与药物和药物类别相关的不良事件的统计富集情况,该检验会针对相关术语之间的依赖性进行调整。我们将我们的结果与基于不成比例分析(比例报告比,PRR)的方法进行比较,并使用涵盖174种药物和四种事件的药物 - 不良事件关联的金标准,通过我们的广义富集分析(GEA)方法量化信号检测方面的改进。对于单一药物,在我们的金标准中的所有四种不良事件结果上,最佳的GEA方法(精确率:0.92/召回率:0.71/F1值:0.80)优于基于最佳PRR的方法(0.69/0.69/0.69)。对于药物类别,当提高不良事件术语的抽象级别时,我们的GEA表现相似(0.85/0.69/0.74)。最后,在检查我们MEDLINE数据集中映射到解剖学治疗学化学分类系统(ATC)中的化学物质的1609种个体药物时,我们发现在应用p < 0.005的GEA时,有1379种药物(10122个独特的不良事件关联)存在信号。
我们提出了一种基于广义富集分析的方法,该方法可用于在给定的粒度水平上检测药物、药物类别与不良事件之间的关联,同时校正事件之间已知的依赖性。我们的研究展示了GEA的用途,以及选择合适的抽象级别以补充当前药物安全方法的重要性。我们提供了一个R包,用于基于信息内容探索不良事件术语的替代抽象级别。