Suppr超能文献

从生物医学文献中大规模提取准确的药物-疾病治疗对,用于药物重定位。

Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing.

机构信息

Medical Informatics Division, Case Western Reserve, Cleveland, OH, USA.

出版信息

BMC Bioinformatics. 2013 Jun 6;14:181. doi: 10.1186/1471-2105-14-181.

Abstract

BACKGROUND

A large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature.

RESULTS

In this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery.

CONCLUSIONS

We demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks.

摘要

背景

对于药物再利用的计算方法,一个大规模、高度准确、机器可理解的药物-疾病治疗关系知识库非常重要。MEDLINE 上可获得大量已发表的生物医学研究文章和临床病例报告,这些文章和报告是 FDA 批准的药物-疾病适应症以及药物再利用知识的丰富来源,对于将 FDA 批准的药物应用于新疾病至关重要。然而,这些信息中的大部分都隐藏在自由文本中,而没有被任何现有数据库所捕获。本研究的目的是从已发表的文献中提取大量准确的药物-疾病治疗对。

结果

在这项研究中,我们开发了一种简单但高度准确的模式学习方法,从 MEDLINE 上可用的 2000 万篇生物医学摘要中提取特定于治疗的药物-疾病对。我们总共提取了 34305 对独特的药物-疾病治疗对,其中大部分都不包含在现有的结构化数据库中。我们的算法在提取所有对时的精度为 0.904,召回率为 0.131,在提取常见对时的精度为 0.904,召回率为 0.842。此外,我们已经表明,提取的对与药物靶基因和治疗类别强烈相关,因此在药物发现中具有很高的潜力。

结论

我们证明了我们的简单模式学习关系提取算法能够从生物医学文献的自由文本中准确地提取许多未被结构化数据库捕获的药物-疾病对。我们研究产生的大规模、准确、机器可理解的药物-疾病治疗知识库,与来自结构化数据库的药物-疾病对相结合,在计算药物再利用任务中具有很高的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c88f/3702428/bb06459b4c5a/1471-2105-14-181-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验