The Jackson Laboratory, Bar Harbor, Maine, United States of America.
PLoS One. 2012;7(4):e35274. doi: 10.1371/journal.pone.0035274. Epub 2012 Apr 26.
Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml) describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters) in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.
染色体上基因的排列是进化过程的产物,我们可以预期,在进化时间的跨度内,更好的排列方式将占主导地位,这通常反映在结构上和/或功能上相关基因的非随机聚类中。这种非随机排列可以通过两种不同的进化过程产生:DNA 序列的重复导致具有相似序列和共同序列特征的基因簇的产生,以及功能相关但不是共同祖先的基因的共同迁移。为了区分这两种情况,这对于未来解开涉及的进化过程的努力非常重要,我们在这里描述了祖先相关基因在接近程度上的程度。为此,我们结合了来自五个基因组数据集的信息,包括 InterPro、SCOP、PANTHER、Ensembl 蛋白质家族和 Ensembl 基因旁系同源物。结果以公开数据集的形式提供(http://cgd.jax.org/datasets/clustering/paraclustering.shtml),描述了在人类和其他九个脊椎动物基因组中,以及 D.melanogaster、C.elegans、A.thaliana 和 S.cerevisiae 基因组中,祖先相关基因在接近程度上超出随机预期(即形成旁系同源物)的程度。除了 Saccharomyces 之外,旁系同源物是我们检查的基因组的一个共同特征。在人类基因组中,它们估计至少包括 22%的所有蛋白质编码基因。旁系同源物在一些基因家族中比其他家族更为普遍,高度物种或类群特异性,并且可以快速进化,有时是对环境线索的反应。总的来说,它们占以前在几个基因组中报道的功能聚类的很大一部分。