Suppr超能文献

从生物医学文献中自动提取癌症和其他疾病相关点突变的方法。

Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.

机构信息

University of Maryland, Baltimore County, Baltimore, MD 21250, USA.

出版信息

Bioinformatics. 2011 Feb 1;27(3):408-15. doi: 10.1093/bioinformatics/btq667. Epub 2010 Dec 7.

Abstract

MOTIVATION

A major goal of biomedical research in personalized medicine is to find relationships between mutations and their corresponding disease phenotypes. However, most of the disease-related mutational data are currently buried in the biomedical literature in textual form and lack the necessary structure to allow easy retrieval and visualization. We introduce a high-throughput computational method for the identification of relevant disease mutations in PubMed abstracts applied to prostate (PCa) and breast cancer (BCa) mutations.

RESULTS

We developed the extractor of mutations (EMU) tool to identify mutations and their associated genes. We benchmarked EMU against MutationFinder--a tool to extract point mutations from text. Our results show that both methods achieve comparable performance on two manually curated datasets. We also benchmarked EMU's performance for extracting the complete mutational information and phenotype. Remarkably, we show that one of the steps in our approach, a filter based on sequence analysis, increases the precision for that task from 0.34 to 0.59 (PCa) and from 0.39 to 0.61 (BCa). We also show that this high-throughput approach can be extended to other diseases.

DISCUSSION

Our method improves the current status of disease-mutation databases by significantly increasing the number of annotated mutations. We found 51 and 128 mutations manually verified to be related to PCa and Bca, respectively, that are not currently annotated for these cancer types in the OMIM or Swiss-Prot databases. EMU's retrieval performance represents a 2-fold improvement in the number of annotated mutations for PCa and BCa. We further show that our method can benefit from full-text analysis once there is an increase in Open Access availability of full-text articles.

AVAILABILITY

Freely available at: http://bioinf.umbc.edu/EMU/ftp.

摘要

动机

个性化医学中生物医学研究的主要目标之一是找到突变与其相应疾病表型之间的关系。然而,目前大多数与疾病相关的突变数据都以文本形式埋藏在生物医学文献中,缺乏必要的结构,难以进行检索和可视化。我们引入了一种高通量计算方法,用于从 PubMed 摘要中识别与前列腺癌(PCa)和乳腺癌(BCa)突变相关的相关疾病突变。

结果

我们开发了突变提取器(EMU)工具来识别突变及其相关基因。我们将 EMU 与 MutationFinder 进行了基准测试,后者是一种从文本中提取点突变的工具。我们的结果表明,这两种方法在两个手动整理的数据集上都具有相当的性能。我们还对 EMU 提取完整突变信息和表型的性能进行了基准测试。值得注意的是,我们展示了我们方法中的一个步骤,即基于序列分析的过滤器,可将该任务的精度从 0.34 提高到 0.59(PCa)和从 0.39 提高到 0.61(BCa)。我们还表明,这种高通量方法可以扩展到其他疾病。

讨论

我们的方法通过显著增加注释突变的数量,改善了疾病-突变数据库的现状。我们发现 51 个和 128 个分别与 PCa 和 BCa 相关的突变,这些突变目前在 OMIM 或 Swiss-Prot 数据库中未被注释为这些癌症类型的突变。EMU 的检索性能代表 PCa 和 BCa 注释突变的数量增加了两倍。我们进一步表明,一旦全文文章的开放获取可用性增加,我们的方法就可以从全文分析中受益。

可用性

可免费在 http://bioinf.umbc.edu/EMU/ftp 获得。

相似文献

引用本文的文献

本文引用的文献

5
High-performance gene name normalization with GeNo.使用GeNo进行高性能基因名称标准化
Bioinformatics. 2009 Mar 15;25(6):815-21. doi: 10.1093/bioinformatics/btp071. Epub 2009 Feb 2.
6
GenBank.基因银行
Nucleic Acids Res. 2009 Jan;37(Database issue):D26-31. doi: 10.1093/nar/gkn723. Epub 2008 Oct 21.
7
McKusick's Online Mendelian Inheritance in Man (OMIM).麦库西克《人类在线孟德尔遗传》(OMIM)。
Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. doi: 10.1093/nar/gkn665. Epub 2008 Oct 8.
10

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验