Xu Rong, Wang QuanQiu
BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-16-S5-S6. Epub 2015 Mar 18.
Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.
For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts.
On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning.
In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.
用于研究药物副作用(药物-SE)关联的系统方法正在成为药物靶点发现和药物重新定位的一个活跃研究领域。然而,一个全面的药物-SE关联知识库并不存在。在本研究中,我们提出了一种新颖的知识驱动(KD)方法,以有效地从已发表的生物医学文献中提取大量药物-SE对。
对于文本语料库,我们使用了21354075条MEDLINE记录(119085682个句子)。首先,我们将从FDA药物标签中获得的已知药物-SE关联作为先验知识,自动找到与SE相关的句子和摘要。然后,我们从MEDLINE句子中总共提取了49575对药物-SE对,从摘要中提取了180454对。
平均而言,KD方法的精确率为0.335,召回率为0.509,F1值为0.392,这明显优于基于支持向量机的机器学习方法(精确率:0.135,召回率:0.900,F1值:0.233),F1分数提高了73.0%。通过综合分析,我们证明了更高层次的表型药物-SE关系反映了更低层次的遗传、基因组和化学药物机制。此外,我们表明提取的药物-SE对可直接用于药物重新定位。
总之,我们自动构建了一个大规模的更高层次的药物表型关系知识,这在计算药物发现中可能具有巨大潜力。