Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of Muenster (WWU), Germany.
Genome Biol Evol. 2012;4(3):316-29. doi: 10.1093/gbe/evs004. Epub 2012 Jan 16.
Plant genomes are generally very large, mostly paleopolyploid, and have numerous gene duplicates and complex genomic features such as repeats and transposable elements. Many of these features have been hypothesized to enable plants, which cannot easily escape environmental challenges, to rapidly adapt. Another mechanism, which has recently been well described as a major facilitator of rapid adaptation in bacteria, animals, and fungi but not yet for plants, is modular rearrangement of protein-coding genes. Due to the high precision of profile-based methods, rearrangements can be well captured at the protein level by characterizing the emergence, loss, and rearrangements of protein domains, their structural, functional, and evolutionary building blocks. Here, we study the dynamics of domain rearrangements and explore their adaptive benefit in 27 plant and 3 algal genomes. We use a phylogenomic approach by which we can explain the formation of 88% of all arrangements by single-step events, such as fusion, fission, and terminal loss of domains. We find many domains are lost along every lineage, but at least 500 domains are novel, that is, they are unique to green plants and emerged more or less recently. These novel domains duplicate and rearrange more readily within their genomes than ancient domains and are overproportionally involved in stress response and developmental innovations. Novel domains more often affect regulatory proteins and show a higher degree of structural disorder than ancient domains. Whereas a relatively large and well-conserved core set of single-domain proteins exists, long multi-domain arrangements tend to be species-specific. We find that duplicated genes are more often involved in rearrangements. Although fission events typically impact metabolic proteins, fusion events often create new signaling proteins essential for environmental sensing. Taken together, the high volatility of single domains and complex arrangements in plant genomes demonstrate the importance of modularity for environmental adaptability of plants.
植物基因组通常非常庞大,大多为古多倍体,具有许多基因重复和复杂的基因组特征,如重复序列和转座元件。许多人假设这些特征使植物能够快速适应,因为它们很难逃避环境挑战。另一种机制,最近被很好地描述为细菌、动物和真菌快速适应的主要促进因素,但尚未在植物中描述,是蛋白质编码基因的模块重排。由于基于轮廓的方法精度很高,通过描述蛋白质结构域的出现、丢失和重排,以及它们的结构、功能和进化构建块,可以很好地在蛋白质水平上捕获重排。在这里,我们研究了结构域重排的动态,并探索了它们在 27 种植物和 3 种藻类基因组中的适应性益处。我们使用系统发育基因组学方法,可以通过单次事件(如融合、分裂和末端结构域丢失)来解释所有排列的形成。我们发现许多结构域沿着每条谱系丢失,但至少有 500 个结构域是新的,也就是说,它们是绿色植物特有的,或多或少是最近出现的。这些新结构域在其基因组中比古老结构域更容易复制和重排,并且更多地参与应激反应和发育创新。新结构域更经常影响调节蛋白,并且比古老结构域表现出更高的结构无序度。虽然存在一个相对较大且保存完好的单结构域蛋白核心集,但长的多结构域排列往往是物种特异性的。我们发现,重复基因更经常参与重排。虽然分裂事件通常影响代谢蛋白,但融合事件经常创造新的信号蛋白,对于环境感应至关重要。总的来说,植物基因组中单结构域和复杂排列的高挥发性表明模块性对于植物的环境适应性的重要性。