用于匿名序列功能注释的生物介质数据整合与推断

Biomediator data integration and inference for functional annotation of anonymous sequences.

作者信息

Cadag Eithon, Louie Brent, Myler Peter J, Tarczy-Hornoch Peter

机构信息

Department of Medical Education and Biomedical Informatics, University of Washington, Seattle, WA, USA.

出版信息

Pac Symp Biocomput. 2007:343-54.

PMID:17990504

Abstract

Scientists working on genomics projects are often faced with the difficult task of sifting through large amounts of biological information dispersed across various online data sources that are relevant to their area or organism of research. Gene annotation, the process of identifying the functional role of a possible gene, in particular has become increasingly more time-consuming and laborious to conduct as more genomes are sequenced and the number of candidate genes continues to increase at near-exponential pace; genes are left un-annotated, or worse, incorrectly annotated. Many groups have attempted to address the annotation backlog through automated annotation systems that are geared toward specific organisms, and which may thus not possess the necessary flexibility and scalability to annotate other genomes. In this paper, we present a method and framework which attempts to address problems inherent in manual and automatic annotation by coupling a data integration system, BioMediator, to an inference engine with the aim of elucidating functional annotations. The framework and heuristics developed are not specific to any particular genome. We validated the method with a set of randomly-selected annotated sequences from a variety of organisms. Preliminary results show that the hybrid data integration and inference approach generates functional annotations that are as good as or better than "gold standard" annotations approximately 80% of the time.

摘要

从事基因组学项目的科学家常常面临一项艰巨任务，即要从分散在各种与他们的研究领域或研究生物体相关的在线数据源中的大量生物信息中进行筛选。基因注释，也就是识别可能基因的功能作用的过程，随着越来越多的基因组被测序，候选基因数量以近乎指数级的速度持续增加，这一过程尤其变得越来越耗时费力；有些基因未被注释，或者更糟糕的是，被错误注释。许多团队试图通过针对特定生物体的自动注释系统来解决注释积压问题，而这些系统可能因此缺乏注释其他基因组所需的灵活性和可扩展性。在本文中，我们提出了一种方法和框架，该方法通过将数据集成系统BioMediator与推理引擎相结合，试图解决手动注释和自动注释中固有的问题，目的是阐明功能注释。所开发的框架和启发式方法并不特定于任何特定的基因组。我们用一组从各种生物体中随机选择的已注释序列对该方法进行了验证。初步结果表明，混合数据集成和推理方法大约80%的时间生成的功能注释与“金标准”注释一样好或更好。

相似文献

Biomediator data integration and inference for functional annotation of anonymous sequences.用于匿名序列功能注释的生物介质数据整合与推断

Pac Symp Biocomput. 2007:343-54.

Large-scale prokaryotic gene prediction and comparison to genome annotation.大规模原核生物基因预测及与基因组注释的比较。

Bioinformatics. 2005 Dec 15;21(24):4322-9. doi: 10.1093/bioinformatics/bti701. Epub 2005 Oct 25.

FIGENIX: intelligent automation of genomic annotation: expertise integration in a new software platform.FIGENIX：基因组注释的智能自动化：新软件平台中的专业知识整合

BMC Bioinformatics. 2005 Aug 5;6:198. doi: 10.1186/1471-2105-6-198.

A procedure for assessing GO annotation consistency.一种评估基因本体（GO）注释一致性的程序。

Bioinformatics. 2005 Jun;21 Suppl 1:i136-43. doi: 10.1093/bioinformatics/bti1019.

MILANO--custom annotation of microarray results using automatic literature searches.米兰——使用自动文献检索对微阵列结果进行定制注释。

BMC Bioinformatics. 2005 Jan 20;6:12. doi: 10.1186/1471-2105-6-12.

CoGenT++: an extensive and extensible data environment for computational genomics.CoGenT++：一个用于计算基因组学的广泛且可扩展的数据环境。

Bioinformatics. 2005 Oct 1;21(19):3806-10. doi: 10.1093/bioinformatics/bti579.

Computational gene annotation in new genome assemblies using GeneID.使用GeneID对新基因组组装体进行计算基因注释。

Methods Mol Biol. 2009;537:243-61. doi: 10.1007/978-1-59745-251-9_12.

JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow.JUICE：一个数据管理系统，可在EST项目工作流程中促进对大量信息的分析。

BMC Bioinformatics. 2006 Nov 23;7:513. doi: 10.1186/1471-2105-7-513.

Automatic detection of subsystem/pathway variants in genome analysis.基因组分析中自动检测子系统/通路变异

Bioinformatics. 2005 Jun;21 Suppl 1:i478-86. doi: 10.1093/bioinformatics/bti1052.

GeneTools--application for functional annotation and statistical hypothesis testing.基因工具——用于功能注释和统计假设检验的应用程序。

BMC Bioinformatics. 2006 Oct 24;7:470. doi: 10.1186/1471-2105-7-470.

引用本文的文献

No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects.人群中没有智慧：大数据时代的基因组注释——现状与未来展望。

Microb Biotechnol. 2018 Jul;11(4):588-605. doi: 10.1111/1751-7915.13284. Epub 2018 May 28.

Learning virulent proteins from integrated query networks.从集成查询网络中学习毒力蛋白。

BMC Bioinformatics. 2012 Dec 2;13:321. doi: 10.1186/1471-2105-13-321.

Transparent mediation-based access to multiple yeast data sources using an ontology driven interface.基于透明中介的本体驱动接口访问多个酵母数据源。

BMC Bioinformatics. 2012 Jan 25;13 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-13-S1-S7.

Biomedical informatics and translational medicine.生物医学信息学与转化医学。

J Transl Med. 2010 Feb 26;8:22. doi: 10.1186/1479-5876-8-22.

Evaluation of probabilistic and logical inference for a SNP annotation system.评估 SNP 注释系统的概率和逻辑推理。

J Biomed Inform. 2010 Jun;43(3):407-18. doi: 10.1016/j.jbi.2009.12.002. Epub 2009 Dec 14.

Evaluating the accuracy of a functional SNP annotation system.评估功能 SNP 注释系统的准确性。

BMC Bioinformatics. 2009 Sep 17;10 Suppl 9(Suppl 9):S11. doi: 10.1186/1471-2105-10-S9-S11.

SNPit: a federated data integration system for the purpose of functional SNP annotation.SNPit：一个用于功能性单核苷酸多态性注释的联邦数据集成系统。

Comput Methods Programs Biomed. 2009 Aug;95(2):181-9. doi: 10.1016/j.cmpb.2009.02.010. Epub 2009 Mar 26.

Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis.基于知识的专家系统以及多序列比对构建与分析的概念验证案例研究。

Brief Bioinform. 2009 Jan;10(1):11-23. doi: 10.1093/bib/bbn045. Epub 2008 Oct 29.

GenoQuery: a new querying module for functional annotation in a genomic warehouse.GenoQuery：基因组数据库中用于功能注释的新型查询模块。

Bioinformatics. 2008 Jul 1;24(13):i322-9. doi: 10.1093/bioinformatics/btn159.

Issues in biomedical research data management and analysis: needs and barriers.生物医学研究数据管理与分析中的问题：需求与障碍

J Am Med Inform Assoc. 2007 Jul-Aug;14(4):478-88. doi: 10.1197/jamia.M2114. Epub 2007 Apr 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于匿名序列功能注释的生物介质数据整合与推断

Biomediator data integration and inference for functional annotation of anonymous sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献