Suppr超能文献

在从自由文本生物医学文献中大规模提取药物-副作用关系方面,将知识驱动方法与监督式机器学习方法进行比较。

Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.

作者信息

Xu Rong, Wang QuanQiu

出版信息

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-16-S5-S6. Epub 2015 Mar 18.

Abstract

BACKGROUND

Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.

DATA AND METHODS

For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts.

RESULTS

On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning.

CONCLUSION

In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.

摘要

背景

用于研究药物副作用(药物-SE)关联的系统方法正在成为药物靶点发现和药物重新定位的一个活跃研究领域。然而,一个全面的药物-SE关联知识库并不存在。在本研究中,我们提出了一种新颖的知识驱动(KD)方法,以有效地从已发表的生物医学文献中提取大量药物-SE对。

数据和方法

对于文本语料库,我们使用了21354075条MEDLINE记录(119085682个句子)。首先,我们将从FDA药物标签中获得的已知药物-SE关联作为先验知识,自动找到与SE相关的句子和摘要。然后,我们从MEDLINE句子中总共提取了49575对药物-SE对,从摘要中提取了180454对。

结果

平均而言,KD方法的精确率为0.335,召回率为0.509,F1值为0.392,这明显优于基于支持向量机的机器学习方法(精确率:0.135,召回率:0.900,F1值:0.233),F1分数提高了73.0%。通过综合分析,我们证明了更高层次的表型药物-SE关系反映了更低层次的遗传、基因组和化学药物机制。此外,我们表明提取的药物-SE对可直接用于药物重新定位。

结论

总之,我们自动构建了一个大规模的更高层次的药物表型关系知识,这在计算药物发现中可能具有巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bf6/4402591/7f042fc4e64a/1471-2105-16-S5-S6-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验