Cadag Eithon, Louie Brent, Myler Peter J, Tarczy-Hornoch Peter
Department of Medical Education and Biomedical Informatics, University of Washington, Seattle, WA, USA.
Pac Symp Biocomput. 2007:343-54.
Scientists working on genomics projects are often faced with the difficult task of sifting through large amounts of biological information dispersed across various online data sources that are relevant to their area or organism of research. Gene annotation, the process of identifying the functional role of a possible gene, in particular has become increasingly more time-consuming and laborious to conduct as more genomes are sequenced and the number of candidate genes continues to increase at near-exponential pace; genes are left un-annotated, or worse, incorrectly annotated. Many groups have attempted to address the annotation backlog through automated annotation systems that are geared toward specific organisms, and which may thus not possess the necessary flexibility and scalability to annotate other genomes. In this paper, we present a method and framework which attempts to address problems inherent in manual and automatic annotation by coupling a data integration system, BioMediator, to an inference engine with the aim of elucidating functional annotations. The framework and heuristics developed are not specific to any particular genome. We validated the method with a set of randomly-selected annotated sequences from a variety of organisms. Preliminary results show that the hybrid data integration and inference approach generates functional annotations that are as good as or better than "gold standard" annotations approximately 80% of the time.
从事基因组学项目的科学家常常面临一项艰巨任务,即要从分散在各种与他们的研究领域或研究生物体相关的在线数据源中的大量生物信息中进行筛选。基因注释,也就是识别可能基因的功能作用的过程,随着越来越多的基因组被测序,候选基因数量以近乎指数级的速度持续增加,这一过程尤其变得越来越耗时费力;有些基因未被注释,或者更糟糕的是,被错误注释。许多团队试图通过针对特定生物体的自动注释系统来解决注释积压问题,而这些系统可能因此缺乏注释其他基因组所需的灵活性和可扩展性。在本文中,我们提出了一种方法和框架,该方法通过将数据集成系统BioMediator与推理引擎相结合,试图解决手动注释和自动注释中固有的问题,目的是阐明功能注释。所开发的框架和启发式方法并不特定于任何特定的基因组。我们用一组从各种生物体中随机选择的已注释序列对该方法进行了验证。初步结果表明,混合数据集成和推理方法大约80%的时间生成的功能注释与“金标准”注释一样好或更好。