Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
Computational Biology Group, Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany.
Methods Mol Biol. 2021;2250:1-14. doi: 10.1007/978-1-0716-1134-0_1.
Plant genomes harbor a particularly rich landscape of repetitive sequences. Transposable elements (TEs) represent a major fraction of this diversity and are intimately linked with plasticity and evolution of genomes across the tree of life (Fedoroff, Science 338:758-767, 2012). Amplification of Long Terminal Repeats (LTR) retrotransposons have shaped the genomic landscape by reshuffling genomic regions, altering gene expression, and providing new regulatory sequences, some of which have been instrumental for crop domestication and breeding (Lisch, Nat Rev Genet 14:49-61, 2013; Vitte et al., Brief Funct Genomics 13:276-295, 2014). While many retrotransposon families are still active within plant genomes, the repetitive nature of retrotransposons has hindered accurate annotation and kingdom-wide predictive assessment of their activity and molecular evolution. While it is natural for the first approach towards a genome annotation to characterize all regions of the genome and associate them with known structures such as particular genes, transposable elements, or other types of non-coding regions, such efforts can result in a large proportion of false-positive annotations when seeking for active loci. To overcome this issue, the next round of annotation efforts needs to include functional annotations based on rigorously defined sequence structures and protein domain compositions. In the context of retrotransposons, such a functional annotation can enable efforts to mobilize particular retrotransposon families in species living today and harness their mutagenic potency for crop improvement (Paszkowski, Curr Opin Biotechnol 32:200-206, 2015). For this purpose, we present a predictive analytical approach to infer the activity and natural variation of retrotransposon families in plants. This is achieved by applying a combination of software and molecular biology tools we developed for functional annotation, activity monitoring, and the assessment of the population structure of particular retrotransposon families in multiple plant species.
植物基因组中蕴藏着特别丰富的重复序列景观。转座元件 (TEs) 代表了这种多样性的主要部分,与生命之树中基因组的可塑性和进化密切相关 (Fedoroff, Science 338:758-767, 2012)。长末端重复 (LTR) 逆转录转座子的扩增通过重新排列基因组区域、改变基因表达和提供新的调控序列,塑造了基因组景观,其中一些序列对于作物驯化和育种至关重要 (Lisch, Nat Rev Genet 14:49-61, 2013; Vitte 等人, Brief Funct Genomics 13:276-295, 2014)。虽然许多逆转录转座子家族在植物基因组中仍然活跃,但逆转录转座子的重复性质阻碍了对其活性和分子进化的准确注释和全基因组预测评估。虽然对基因组注释的第一种方法是对基因组的所有区域进行特征描述,并将它们与已知的结构(如特定基因、转座元件或其他类型的非编码区域)联系起来是很自然的,但当寻找活性基因座时,这种方法可能会导致很大比例的假阳性注释。为了克服这个问题,下一轮注释工作需要基于严格定义的序列结构和蛋白质结构域组成来进行功能注释。在逆转录转座子的背景下,这种功能注释可以使人们努力调动当今生活的物种中的特定逆转录转座子家族,并利用它们的诱变能力来改善作物 (Paszkowski, Curr Opin Biotechnol 32:200-206, 2015)。为此,我们提出了一种预测分析方法,以推断植物中转座元件家族的活性和自然变异。这是通过应用我们为功能注释、活性监测以及特定逆转录转座子家族在多个植物物种中的种群结构评估而开发的软件和分子生物学工具的组合来实现的。