Bioinformatics and Systems Biology Program, University of California, La Jolla, San Diego, CA, USA.
Department of Bioengineering, University of California, La Jolla, San Diego, CA, USA.
Genome Biol. 2023 Aug 8;24(1):183. doi: 10.1186/s13059-023-03028-2.
Cumulative sequencing efforts have yielded enough genomes to construct pangenomes for dozens of bacterial species and elucidate intraspecies gene conservation. Given the diversity of organisms for which this is achievable, similar analyses for ancestral species are feasible through the integration of pangenomics and phylogenetics, promising deeper insights into the nature of ancient life.
We construct pangenomes for 183 bacterial species from 54,085 genomes and identify their core genomes using a novel statistical model to estimate genome-specific error rates and underlying gene frequencies. The core genomes are then integrated into a phylogenetic tree to reconstruct the core genome of the last bacterial common ancestor (LBCA), yielding three main results: First, the gene content of modern and ancestral core genomes are diverse at the level of individual genes but are similarly distributed by functional category and share several poorly characterized genes. Second, the LBCA core genome is distinct from any individual modern core genome but has many fundamental biological systems intact, especially those involving translation machinery and biosynthetic pathways to all major nucleotides and amino acids. Third, despite this metabolic versatility, the LBCA core genome likely requires additional non-core genes for viability, based on comparisons with the minimal organism, JCVI-Syn3A.
These results suggest that many cellular systems commonly conserved in modern bacteria were not just present in ancient bacteria but were nearly immutable with respect to short-term intraspecies variation. Extending this analysis to other domains of life will likely provide similar insights into more distant ancestral species.
随着测序工作的不断积累,已经构建了数十种细菌的泛基因组,并阐明了种内基因的保守性。鉴于有如此多的生物可以实现这一目标,通过泛基因组学和系统发生学的整合,对祖先物种进行类似的分析是可行的,有望更深入地了解古代生命的本质。
我们构建了 183 种细菌的泛基因组,这些细菌来自 54085 个基因组,并使用一种新的统计模型来识别它们的核心基因组,以估计基因组特异性错误率和潜在基因频率。然后,将核心基因组整合到系统发生树中,以重建最后一个细菌共同祖先(LBCA)的核心基因组,得到了三个主要结果:首先,现代和祖先核心基因组的基因内容在单个基因水平上是多样化的,但在功能类别上的分布相似,并且共享几个特征不明显的基因。其次,LBCA 核心基因组与任何一个现代核心基因组都不同,但保留了许多基本的生物学系统,特别是那些涉及翻译机制和生物合成途径的系统,这些途径涉及所有主要的核苷酸和氨基酸。第三,尽管具有这种代谢多样性,但根据与最小生物体 JCVI-Syn3A 的比较,LBCA 核心基因组可能需要额外的非核心基因才能存活。
这些结果表明,许多在现代细菌中普遍保守的细胞系统不仅存在于古代细菌中,而且在种内短期变异方面几乎是不变的。将这种分析扩展到生命的其他领域,可能会为更遥远的祖先物种提供类似的见解。