Zheng Yu, Roberts Richard J, Kasif Simon
Bioinformatics Graduate Program, Boston University, Boston, MA 02215, USA.
Genome Biol. 2002 Oct 10;3(11):RESEARCH0060. doi: 10.1186/gb-2002-3-11-research0060.
The current speed of sequencing already exceeds the capability of annotation, creating a potential bottleneck. A large proportion of the genes in microbial genomes remains uncharacterized. Here we propose a new method for functional annotation using the conservation patterns of gene clusters. If several gene clusters show the same coevolution pattern across different genomes it is reasonable to infer they are functionally related. The gene cluster phylogenetic profile integrates chromosomal proximity information and phylogenetic profile information and allows us to infer functional dependences between the gene clusters even at great distance on the chromosome.
As a proof of concept, we applied our method to the genome of Escherichia coli K12 strain. Our method establishes functional relationships among 176 gene clusters, comprising 738 E. coli genes. The accuracy of pair phylogenetic profiles was compared with the single-gene phylogenetic profile and was shown to be higher. As a result, we are able to suggest functional roles for several previously unknown genes or unknown genomic regions in E. coli. We also examined the robustness of coevolution signals across a larger set of genomes and suggest a possible upper limit of accuracy for the phylogenetic profile methods.
The higher-order phylogenetic profiles, such as the gene-pair phylogenetic profiles, can detect functional dependences that are missed by using conventional single-gene phylogenetic profile or the chromosomal proximity method only. We show that the gene-pair phylogenetic profile is more accurate than the single-gene phylogenetic profiles.
当前的测序速度已经超过了注释能力,形成了一个潜在的瓶颈。微生物基因组中的很大一部分基因仍未得到表征。在此,我们提出一种利用基因簇保守模式进行功能注释的新方法。如果几个基因簇在不同基因组中呈现相同的协同进化模式,那么推断它们在功能上相关是合理的。基因簇系统发育谱整合了染色体邻近信息和系统发育谱信息,使我们能够推断即使在染色体上相距很远的基因簇之间的功能依赖性。
作为概念验证,我们将我们的方法应用于大肠杆菌K12菌株的基因组。我们的方法建立了176个基因簇之间的功能关系,这些基因簇包含738个大肠杆菌基因。将基因对系统发育谱的准确性与单基因系统发育谱进行了比较,结果显示前者更高。因此,我们能够为大肠杆菌中几个先前未知的基因或未知基因组区域提出功能作用。我们还在更大的基因组集合中检验了协同进化信号的稳健性,并提出了系统发育谱方法准确性的可能上限。
高阶系统发育谱,如基因对系统发育谱,能够检测到仅使用传统单基因系统发育谱或染色体邻近方法所遗漏的功能依赖性。我们表明基因对系统发育谱比单基因系统发育谱更准确。