Notebaart Richard A, Huynen Martijn A, Teusink Bas, Siezen Roland J, Snel Berend
Center for Molecular and Biomolecular Informatics, Radboud University, Nijmegen, The Netherlands.
Nucleic Acids Res. 2005 Oct 27;33(19):6164-71. doi: 10.1093/nar/gki913. Print 2005.
A key complication in comparative genomics for reliable gene function prediction is the existence of duplicated genes. To study the effect of gene duplication on function prediction, we analyze orthologs between pairs of genomes where in one genome the orthologous gene has duplicated after the speciation of the two genomes (i.e. inparalogs). For these duplicated genes we investigate whether the gene that is most similar on the sequence level is also the gene that has retained the ancestral gene-neighborhood. Although the majority of investigated cases show a consistent pattern between sequence similarity and gene-neighborhood conservation, a substantial fraction, 29-38%, is inconsistent. The observation of inconsistency is not the result of a chance outcome owing to a lack of divergence time between inparalogs, but rather it seems to be the result of a chance outcome caused by very similar rates of sequence evolution of both inparalogs relative to their ortholog. If one-to-one orthologous relationships are required, it is advisable to combine contextual information (i.e. gene-neighborhood in prokaryotes and co-expression in eukaryotes) with protein sequence information to predict the most probable functional equivalent ortholog in the presence of inparalogs.
在比较基因组学中,进行可靠的基因功能预测时,一个关键的复杂因素是重复基因的存在。为了研究基因重复对功能预测的影响,我们分析了基因组对之间的直系同源基因,其中在一个基因组中,直系同源基因在两个基因组物种形成后发生了重复(即旁系同源基因)。对于这些重复基因,我们研究在序列水平上最相似的基因是否也是保留了祖先基因邻域的基因。尽管大多数研究案例显示序列相似性和基因邻域保守性之间存在一致的模式,但仍有相当一部分(29%-38%)是不一致的。这种不一致的观察结果并非由于旁系同源基因之间缺乏分化时间而导致的偶然结果,而是由于两个旁系同源基因相对于其直系同源基因的序列进化速率非常相似,似乎是偶然结果导致的。如果需要一对一的直系同源关系,建议在存在旁系同源基因的情况下,将上下文信息(即原核生物中的基因邻域和真核生物中的共表达)与蛋白质序列信息相结合,以预测最可能的功能等效直系同源基因。