Cortez Diego, Forterre Patrick, Gribaldo Simonetta
Institut Pasteur, Département de Microbiologie, Unité de Biologie Moléculaire du Gène chez les Extrêmophiles, Paris, France.
Genome Biol. 2009;10(6):R65. doi: 10.1186/gb-2009-10-6-r65. Epub 2009 Jun 16.
Archaeal and bacterial genomes contain a number of genes of foreign origin that arose from recent horizontal gene transfer, but the role of integrative elements (IEs), such as viruses, plasmids, and transposable elements, in this process has not been extensively quantified. Moreover, it is not known whether IEs play an important role in the origin of ORFans (open reading frames without matches in current sequence databases), whose proportion remains stable despite the growing number of complete sequenced genomes.
We have performed a large-scale survey of potential recently acquired IEs in 119 archaeal and bacterial genomes. We developed an accurate in silico Markov model-based strategy to identify clusters of genes that show atypical sequence composition (clusters of atypical genes or CAGs) and are thus likely to be recently integrated foreign elements, including IEs. Our method identified a high number of new CAGs. Probabilistic analysis of gene content indicates that 56% of these new CAGs are likely IEs, whereas only 7% likely originated via horizontal gene transfer from distant cellular sources. Thirty-four percent of CAGs remain unassigned, what may reflect a still poor sampling of IEs associated with bacterial and archaeal diversity. Moreover, our study contributes to the issue of the origin of ORFans, because 39% of these are found inside CAGs, many of which likely represent recently acquired IEs.
Our results strongly indicate that archaeal and bacterial genomes contain an impressive proportion of recently acquired foreign genes (including ORFans) coming from a still largely unexplored reservoir of IEs.
古菌和细菌基因组包含许多源自近期水平基因转移的外源基因,但整合元件(如病毒、质粒和转座元件)在此过程中的作用尚未得到广泛量化。此外,尚不清楚整合元件在孤儿基因(在当前序列数据库中无匹配项的开放阅读框)的起源中是否起重要作用,尽管已完成测序的基因组数量不断增加,但其比例仍保持稳定。
我们对119个古菌和细菌基因组中潜在的近期获得的整合元件进行了大规模调查。我们开发了一种基于计算机马尔可夫模型的精确策略,以识别显示非典型序列组成的基因簇(非典型基因簇或CAGs),因此这些基因簇可能是近期整合的外源元件,包括整合元件。我们方法识别出大量新的CAGs。基因含量的概率分析表明,这些新CAGs中有56%可能是整合元件,而只有7%可能是通过水平基因转移从遥远的细胞来源产生的。34%的CAGs仍未确定归属,这可能反映了与细菌和古菌多样性相关的整合元件的采样仍然不足。此外,我们的研究有助于解决孤儿基因的起源问题,因为其中39%存在于CAGs中,其中许多可能代表近期获得的整合元件。
我们结果强烈表明,古菌和细菌基因组包含来自一个很大程度上尚未探索的整合元件库的数量可观的近期获得的外源基因(包括孤儿基因)。