Department of Continental Ecology-Biogeodynamics & Biodiversity Group, Centre d'Estudis Avançats de Blanes, CEAB-CSIC, E-17300 Blanes, Girona, Spain.
Mol Phylogenet Evol. 2011 Dec;61(3):650-8. doi: 10.1016/j.ympev.2011.08.011. Epub 2011 Aug 16.
Comparative genomics is an essential tool to unravel how genomes change over evolutionary time and to gain clues on the links between functional genomics and evolution. In prokaryotes, the large, good quality, genome sequences available in public databases and the recently developed large-scale computational methods, offer an unprecedent view on the ecology and evolution of microorganisms through comparative genomics. In this work, we examined the links among genome structure (i.e., the sequential distribution of nucleotides itself by detrended fluctuation analysis, DFA) and genomic diversity (i.e., gene functionality by Clusters of Orthologous Genes, COGs) in 828 full sequenced prokaryotic genomes from 548 different bacteria and archaea species. DFA scaling exponent α indicated persistent long-range correlations (fractality) in each genome analyzed. Higher resolution power was found when considering the sequential succession of purine (AG) vs. pyrimidine (CT) bases than either keto (GT) to amino (AC) forms or strongly (GC) vs. weakly (AT) bonded nucleotides. Interestingly, the phyla Aquificae, Fusobacteria, Dictyoglomi, Nitrospirae, and Thermotogae were closer to archaea than to their bacterial counterparts. A strong significant correlation was found between scaling exponent α and COGs distribution, and we consistently observed that the larger α the more heterogeneous was the gene distribution within each functional category, suggesting a close relationship between primary nucleotides sequence structure and functional genes composition.
比较基因组学是揭示基因组在进化过程中如何变化的重要工具,并为功能基因组学和进化之间的联系提供线索。在原核生物中,公共数据库中提供的大量、高质量的基因组序列,以及最近开发的大规模计算方法,为通过比较基因组学研究微生物的生态学和进化提供了前所未有的视角。在这项工作中,我们研究了 828 个完整测序的原核生物基因组中的基因组结构(即通过去趋势波动分析 (DFA) 来分析核苷酸本身的序列分布)和基因组多样性(即通过同源基因聚类 (COGs) 来分析基因功能)之间的联系。DFA 标度指数α 表明每个分析的基因组中存在持久的长程相关性(分形)。当考虑嘌呤 (AG) 与嘧啶 (CT) 碱基的连续顺序时,分辨率更高,而不是酮 (GT) 与氨基 (AC) 形式或强 (GC) 与弱 (AT) 键合核苷酸。有趣的是,Aquificae、Fusobacteria、Dictyoglomi、Nitrospirae 和 Thermotogae 门与古菌的关系比与细菌的关系更密切。我们发现标度指数α与 COGs 分布之间存在很强的显著相关性,并且我们一致观察到,较大的α意味着每个功能类别中的基因分布越不均匀,这表明初级核苷酸序列结构与功能基因组成之间存在密切关系。