Suppr超能文献

从生物医学文献中自动构建大规模且准确的药物-副作用关联知识库。

Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature.

作者信息

Xu Rong, Wang QuanQiu

机构信息

Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, OH 44106, United States.

ThinTek, LLC, Palo Alto, CA 94306, United States.

出版信息

J Biomed Inform. 2014 Oct;51:191-9. doi: 10.1016/j.jbi.2014.05.013. Epub 2014 Jun 10.

Abstract

Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for drug target discovery, drug repositioning, and drug toxicity prediction. However, currently available drug-SE association databases are far from being complete. Herein, in an effort to increase the data completeness of current drug-SE relationship resources, we present an automatic learning approach to accurately extract drug-SE pairs from the vast amount of published biomedical literature, a rich knowledge source of side effect information for commercial, experimental, and even failed drugs. For the text corpus, we used 119,085,682 MEDLINE sentences and their parse trees. We used known drug-SE associations derived from US Food and Drug Administration (FDA) drug labels as prior knowledge to find relevant sentences and parse trees. We extracted syntactic patterns associated with drug-SE pairs from the resulting set of parse trees. We developed pattern-ranking algorithms to prioritize drug-SE-specific patterns. We then selected a set of patterns with both high precisions and recalls in order to extract drug-SE pairs from the entire MEDLINE. In total, we extracted 38,871 drug-SE pairs from MEDLINE using the learned patterns, the majority of which have not been captured in FDA drug labels to date. On average, our knowledge-driven pattern-learning approach in extracting drug-SE pairs from MEDLINE has achieved a precision of 0.833, a recall of 0.407, and an F1 of 0.545. We compared our approach to a support vector machine (SVM)-based machine learning and a co-occurrence statistics-based approach. We show that the pattern-learning approach is largely complementary to the SVM- and co-occurrence-based approaches with significantly higher precision and F1 but lower recall. We demonstrated by correlation analysis that the extracted drug side effects correlate positively with both drug targets, metabolism, and indications.

摘要

研究药物副作用(drug-SE)关联的系统方法正在成为药物靶点发现、药物重新定位和药物毒性预测的一个活跃研究领域。然而,目前可用的药物-SE关联数据库远未完整。在此,为了提高当前药物-SE关系资源的数据完整性,我们提出一种自动学习方法,以从大量已发表的生物医学文献中准确提取药物-SE对,这些文献是商业、实验甚至失败药物副作用信息的丰富知识来源。对于文本语料库,我们使用了119,085,682条MEDLINE句子及其句法分析树。我们将源自美国食品药品监督管理局(FDA)药品标签的已知药物-SE关联作为先验知识,以找到相关句子和句法分析树。我们从所得的句法分析树集合中提取与药物-SE对相关的句法模式。我们开发了模式排序算法,对药物-SE特定模式进行优先级排序。然后,我们选择了一组具有高精确度和召回率的模式,以便从整个MEDLINE中提取药物-SE对。我们总共使用所学习的模式从MEDLINE中提取了38,871个药物-SE对,其中大多数至今尚未在FDA药品标签中被收录。平均而言,我们基于知识的模式学习方法在从MEDLINE中提取药物-SE对方面实现了0.833的精确度、0.407的召回率和0.545的F1值。我们将我们的方法与基于支持向量机(SVM)的机器学习方法和基于共现统计的方法进行了比较。我们表明,模式学习方法在很大程度上与基于SVM和共现的方法互补,具有显著更高的精确度和F1值,但召回率较低。我们通过相关性分析证明,提取的药物副作用与药物靶点、代谢和适应症均呈正相关。

相似文献

引用本文的文献

本文引用的文献

3
Chapter 16: text mining for translational bioinformatics.第十六章:转化生物信息学中的文本挖掘。
PLoS Comput Biol. 2013 Apr;9(4):e1003044. doi: 10.1371/journal.pcbi.1003044. Epub 2013 Apr 25.
5
Computational drug repositioning: from data to therapeutics.计算药物重定位:从数据到治疗。
Clin Pharmacol Ther. 2013 Apr;93(4):335-41. doi: 10.1038/clpt.2013.1. Epub 2013 Jan 15.
7
Pharmacogenomics knowledge for personalized medicine.药物基因组学知识与个性化医疗。
Clin Pharmacol Ther. 2012 Oct;92(4):414-7. doi: 10.1038/clpt.2012.96.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验