Institute for Medical Genetics, Charité-Universitätsmedizin, Berlin, Germany.
PLoS One. 2010 Jan 28;5(1):e8861. doi: 10.1371/journal.pone.0008861.
Whole genome gene order evolution in higher eukaryotes was initially considered as a random process. Gene order conservation or conserved synteny was seen as a feature of common descent and did not imply the existence of functional constraints. This view had to be revised in the light of results from sequencing dozens of vertebrate genomes.It became apparent that other factors exist that constrain gene order in some genomic regions over long evolutionary time periods. Outside of these regions, genomes diverge more rapidly in terms of gene content and order.We have developed CYNTENATOR, a progressive gene order alignment software, to identify genomic regions of conserved synteny over a large set of diverging species. CYNTENATOR does not depend on nucleotide-level alignments and a priori homology assignment. Our software implements an improved scoring function that utilizes the underlying phylogeny.In this manuscript, we report on our progressive gene order alignment approach, a and give a comparison to previous software and an analysis of 17 vertebrate genomes for conservation in gene order.CYNTENATOR has a runtime complexity of and a space complexity of with being the gene number in a genome. CYNTENATOR performs as good as state-of-the-art software on simulated pairwise gene order comparisons, but is the only algorithm that works in practice for aligning dozens of vertebrate-sized gene orders.Lineage-specific characterization of gene order across 17 vertebrate genomes revealed mechanisms for maintaining conserved synteny such as enhancers and coregulation by bidirectional promoters. Genes outside conserved synteny blocks show enrichments for genes involved in responses to external stimuli, stimuli such as immunity and olfactory response in primate genome comparisons. We even see significant gene ontology term enrichments for breakpoint regions of ancestral nodes close to the root of the phylogeny. Additionally, our analysis of transposable elements has revealed a significant accumulation of LINE-1 elements in mammalian breakpoint regions. In summary, CYNTENATOR is a flexible and scalable tool for the identification of conserved gene orders across multiple species over long evolutionary distances.
高等真核生物全基因组基因顺序进化最初被认为是一个随机过程。基因顺序的保守性或保守的同线性被视为共同进化的特征,并不意味着存在功能约束。这种观点在对数十种脊椎动物基因组进行测序的结果的基础上必须进行修正。很明显,在很长的进化时间内,其他因素存在于一些基因组区域中限制基因顺序。在这些区域之外,基因组在基因内容和顺序方面的分化速度更快。我们开发了 CYNTENATOR,这是一种渐进式基因顺序比对软件,用于识别大量分化物种中保守同线性的基因组区域。CYNTENATOR 不依赖核苷酸水平的比对和先验同源性分配。我们的软件实现了一种改进的评分函数,该函数利用了潜在的系统发育。在本文中,我们报告了我们的渐进式基因顺序比对方法,并与以前的软件进行了比较,并对 17 种脊椎动物基因组的基因顺序保守性进行了分析。CYNTENATOR 的运行时间复杂度为,空间复杂度为,其中是基因组中的基因数量。CYNTENATOR 在模拟的基因顺序比对中与最先进的软件表现一样好,但它是唯一一种在实际中适用于对齐数十个脊椎动物大小基因顺序的算法。17 种脊椎动物基因组的谱系特异性基因顺序特征揭示了维持保守同线性的机制,如增强子和双向启动子的核心调控。不在保守同线性块中的基因显示出与对外界刺激的反应相关的基因的富集,如在灵长类动物基因组比较中观察到的免疫和嗅觉反应。我们甚至在靠近系统发育树根部的祖先节点的断点区域看到了显著的基因本体论术语富集。此外,我们对转座元件的分析揭示了哺乳动物断点区域中 LINE-1 元件的显著积累。总之,CYNTENATOR 是一种灵活且可扩展的工具,可用于识别跨越多个物种的长期进化距离的保守基因顺序。