Heuberger Matthias, Koo Dal-Hoe, Ahmed Hanin Ibrahim, Tiwari Vijay K, Abrouk Michael, Poland Jesse, Krattinger Simon G, Wicker Thomas
Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland.
Wheat Genetics Resource Center and Department of Plant Pathology, Kansas State University, Manhattan, KS, 66506, USA.
Mob DNA. 2024 Aug 5;15(1):16. doi: 10.1186/s13100-024-00326-9.
Centromere function is highly conserved across eukaryotes, but the underlying centromeric DNA sequences vary dramatically between species. Centromeres often contain a high proportion of repetitive DNA, such as tandem repeats and/or transposable elements (TEs). Einkorn wheat centromeres lack tandem repeat arrays and are instead composed mostly of the two long terminal repeat (LTR) retrotransposon families RLG_Cereba and RLG_Quinta which specifically insert in centromeres. However, it is poorly understood how these two TE families relate to each other and if and how they contribute to centromere function and evolution.
Based on conservation of diagnostic motifs (LTRs, integrase and primer binding site and polypurine-tract), we propose that RLG_Cereba and RLG_Quinta are a pair of autonomous and non-autonomous partners, in which the autonomous RLG_Cereba contributes all the proteins required for transposition, while the non-autonomous RLG_Quinta contributes GAG protein. Phylogenetic analysis of predicted GAG proteins showed that the RLG_Cereba lineage was present for at least 100 million years in monocotyledon plants. In contrast, RLG_Quinta evolved from RLG_Cereba between 28 and 35 million years ago in the common ancestor of oat and wheat. Interestingly, the integrase of RLG_Cereba is fused to a so-called CR-domain, which is hypothesized to guide the integrase to the functional centromere. Indeed, ChIP-seq data and TE population analysis show only the youngest subfamilies of RLG_Cereba and RLG_Quinta are found in the active centromeres. Importantly, the LTRs of RLG_Quinta and RLG_Cereba are strongly associated with the presence of the centromere-specific CENH3 histone variant. We hypothesize that the LTRs of RLG_Cereba and RLG_Quinta contribute to wheat centromere integrity by phasing and/or placing CENH3 nucleosomes, thus favoring their persistence in the competitive centromere-niche.
Our data show that RLG_Cereba cross-mobilizes the non-autonomous RLG_Quinta retrotransposons. New copies of both families are specifically integrated into functional centromeres presumably through direct binding of the integrase CR domain to CENH3 histone variants. The LTRs of newly inserted RLG_Cereba and RLG_Quinta elements, in turn, recruit and/or phase new CENH3 deposition. This mutualistic interplay between the two TE families and the plant host dynamically maintains wheat centromeres.
着丝粒功能在真核生物中高度保守,但潜在的着丝粒DNA序列在不同物种间差异巨大。着丝粒通常含有高比例的重复DNA,如串联重复序列和/或转座元件(TEs)。一粒小麦的着丝粒缺乏串联重复阵列,而是主要由两个长末端重复(LTR)反转录转座子家族RLG_Cereba和RLG_Quinta组成,它们特异性地插入着丝粒中。然而,人们对这两个TE家族之间的关系以及它们是否以及如何对着丝粒功能和进化起作用了解甚少。
基于诊断基序(LTRs、整合酶、引物结合位点和多嘌呤序列)的保守性,我们提出RLG_Cereba和RLG_Quinta是一对自主和非自主的伙伴,其中自主的RLG_Cereba提供转座所需的所有蛋白质,而非自主的RLG_Quinta提供GAG蛋白。对预测的GAG蛋白的系统发育分析表明,RLG_Cereba谱系在单子叶植物中存在至少1亿年。相比之下,RLG_Quinta在2800万至3500万年前从燕麦和小麦的共同祖先中的RLG_Cereba进化而来。有趣的是,RLG_Cereba的整合酶与一个所谓的CR结构域融合,据推测该结构域可引导整合酶到达功能性着丝粒。实际上,ChIP-seq数据和TE群体分析表明,仅在活跃着丝粒中发现了RLG_Cereba和RLG_Quinta的最年轻亚家族。重要的是,RLG_Quinta和RLG_Cereba的LTRs与着丝粒特异性CENH3组蛋白变体的存在密切相关。我们推测,RLG_Cereba和RLG_Quinta的LTRs通过使CENH3核小体定相和/或定位来促进小麦着丝粒的完整性,从而有利于它们在竞争性着丝粒生态位中的持续存在。
我们的数据表明,RLG_Cereba可跨激活非自主的RLG_Quinta反转录转座子。这两个家族的新拷贝可能通过整合酶CR结构域与CENH3组蛋白变体的直接结合而特异性地整合到功能性着丝粒中。反过来,新插入的RLG_Cereba和RLG_Quinta元件的LTRs招募和/或使新的CENH3沉积定相。这两个TE家族与植物宿主之间的这种互利相互作用动态地维持着小麦着丝粒。