MetaGene:从环境基因组鸟枪法测序中寻找原核生物基因

MetaGene: prokaryotic gene finding from environmental genome shotgun sequences.

作者信息

Noguchi Hideki, Park Jungho, Takagi Toshihisa

机构信息

Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba 277-8562, Japan.

出版信息

Nucleic Acids Res. 2006;34(19):5623-30. doi: 10.1093/nar/gkl723. Epub 2006 Oct 5.

Abstract

Exhaustive gene identification is a fundamental goal in all metagenomics projects. However, most metagenomic sequences are unassembled anonymous fragments, and conventional gene-finding methods cannot be applied. We have developed a prokaryotic gene-finding program, MetaGene, which utilizes di-codon frequencies estimated by the GC content of a given sequence with other various measures. MetaGene can predict a whole range of prokaryotic genes based on the anonymous genomic sequences of a few hundred bases, with a sensitivity of 95% and a specificity of 90% for artificial shotgun sequences (700 bp fragments from 12 species). MetaGene has two sets of codon frequency interpolations, one for bacteria and one for archaea, and automatically selects the proper set for a given sequence using the domain classification method we propose. The domain classification works properly, correctly assigning domain information to more than 90% of the artificial shotgun sequences. Applied to the Sargasso Sea dataset, MetaGene predicted almost all of the annotated genes and a notable number of novel genes. MetaGene can be applied to wide variety of metagenomic projects and expands the utility of metagenomics.

摘要

详尽的基因识别是所有宏基因组学项目的基本目标。然而,大多数宏基因组序列是未组装的匿名片段,传统的基因发现方法无法应用。我们开发了一个原核生物基因发现程序MetaGene,它利用给定序列的GC含量估计的双密码子频率以及其他各种指标。MetaGene可以基于几百个碱基的匿名基因组序列预测整个原核生物基因范围,对于人工鸟枪法序列(来自12个物种的700bp片段),其灵敏度为95%,特异性为90%。MetaGene有两组密码子频率插值,一组用于细菌,一组用于古细菌,并使用我们提出的域分类方法为给定序列自动选择合适的一组。域分类工作正常,能将域信息正确分配给超过90%的人工鸟枪法序列。应用于马尾藻海数据集时,MetaGene预测了几乎所有注释基因以及大量新基因。MetaGene可应用于各种各样的宏基因组学项目,扩展了宏基因组学的效用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/11af/1636498/53a1d09fce1f/gkl723f1.jpg

引用本文的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索