School of Biology, Georgia Institute of Technology, Ford ES&T Building, Rm 1242, 311 Ferst Drive, Atlanta, GA 30332, USA.
Genome Biol. 2011;12(3):R26. doi: 10.1186/gb-2011-12-3-r26. Epub 2011 Mar 22.
Combined metagenomic and metatranscriptomic datasets make it possible to study the molecular evolution of diverse microbial species recovered from their native habitats. The link between gene expression level and sequence conservation was examined using shotgun pyrosequencing of microbial community DNA and RNA from diverse marine environments, and from forest soil.
Across all samples, expressed genes with transcripts in the RNA sample were significantly more conserved than non-expressed gene sets relative to best matches in reference databases. This discrepancy, observed for many diverse individual genomes and across entire communities, coincided with a shift in amino acid usage between these gene fractions. Expressed genes trended toward GC-enriched amino acids, consistent with a hypothesis of higher levels of functional constraint in this gene pool. Highly expressed genes were significantly more likely to fall within an orthologous gene set shared between closely related taxa (core genes). However, non-core genes, when expressed above the level of detection, were, on average, significantly more highly expressed than core genes based on transcript abundance normalized to gene abundance. Finally, expressed genes showed broad similarities in function across samples, being relatively enriched in genes of energy metabolism and underrepresented by genes of cell growth.
These patterns support the hypothesis, predicated on studies of model organisms, that gene expression level is a primary correlate of evolutionary rate across diverse microbial taxa from natural environments. Despite their complexity, meta-omic datasets can reveal broad evolutionary patterns across taxonomically, functionally, and environmentally diverse communities.
组合宏基因组学和宏转录组学数据集使得研究从其自然栖息地中回收的各种微生物物种的分子进化成为可能。使用来自不同海洋环境和森林土壤的微生物群落 DNA 和 RNA 的鸟枪法焦磷酸测序,检查了基因表达水平与序列保守性之间的联系。
在所有样本中,与参考数据库中的最佳匹配相比,具有 RNA 样本中转录物的表达基因相对于非表达基因集显著更保守。这种差异在许多不同的个体基因组和整个群落中都观察到,与这些基因分数之间的氨基酸使用发生变化一致。表达基因倾向于富含 GC 的氨基酸,这与该基因库中功能约束水平较高的假设一致。高度表达的基因更有可能属于密切相关分类群之间共享的同源基因集(核心基因)。然而,当非核心基因的表达水平高于检测水平时,根据基因丰度归一化的转录物丰度,它们的表达水平平均显著高于核心基因。最后,表达基因在样本中表现出功能上的广泛相似性,相对丰富的是能量代谢基因,而细胞生长基因则代表不足。
这些模式支持了这样一种假设,即基于对模式生物的研究,基因表达水平是自然环境中不同微生物分类群进化率的主要相关因素。尽管它们很复杂,但元组学数据集可以揭示从分类学、功能和环境多样性的群落中广泛的进化模式。