Sironi Manuela, Menozzi Giorgia, Comi Giacomo P, Cereda Matteo, Cagliani Rachele, Bresolin Nereo, Pozzoli Uberto
Scientific Institute IRCCS E Medea, Bioinformatic Lab, Via don L Monza, 23842 Bosisio Parini (LC), Italy.
Genome Biol. 2006;7(12):R120. doi: 10.1186/gb-2006-7-12-r120.
Transposable elements (TEs) represent more than 45% of the human and mouse genomes. Both parasitic and mutualistic features have been shown to apply to the host-TE relationship but a comprehensive scenario of the forces driving TE fixation within mammalian genes is still missing.
We show that intronic multispecies conserved sequences (MCSs) have been affecting TE integration frequency over time. We verify that a selective economizing pressure has been acting on TEs to decrease their frequency in highly expressed genes. After correcting for GC content, MCS density and intron size, we identified TE-enriched and TE-depleted gene categories. In addition to developmental regulators and transcription factors, TE-depleted regions encompass loci that might require subtle regulation of transcript levels or precise activation timing, such as growth factors, cytokines, hormones, and genes involved in the immune response. The latter, despite having reduced frequencies of most TE types, are significantly enriched in mammalian-wide interspersed repeats (MIRs). Analysis of orthologous genes indicated that MIR over-representation also occurs in dog and opossum immune response genes, suggesting, given the partially independent origin of MIR sequences in eutheria and metatheria, the evolutionary conservation of a specific function for MIRs located in these loci. Consistently, the core MIR sequence is over-represented in defense response genes compared to the background intronic frequency.
Our data indicate that gene function, expression level, and sequence conservation influence TE insertion/fixation in mammalian introns. Moreover, we provide the first report showing that a specific TE family is evolutionarily associated with a gene function category.
转座元件(TEs)占人类和小鼠基因组的比例超过45%。寄生和共生特征均已被证明适用于宿主与转座元件的关系,但驱动转座元件在哺乳动物基因中固定的各种力量的全面情况仍不清楚。
我们发现内含子多物种保守序列(MCSs)长期以来一直在影响转座元件的整合频率。我们证实,一种选择性节约压力一直在作用于转座元件,以降低它们在高表达基因中的频率。在校正GC含量、MCS密度和内含子大小后,我们确定了富含转座元件和转座元件缺失的基因类别。除了发育调节因子和转录因子外,转座元件缺失区域还包括可能需要对转录水平进行精细调节或精确激活时间的基因座,如生长因子、细胞因子、激素以及参与免疫反应的基因。后者尽管大多数转座元件类型的频率降低,但在全哺乳动物散在重复序列(MIRs)中显著富集。对直系同源基因的分析表明,MIR的过度代表也出现在狗和负鼠的免疫反应基因中,鉴于真兽亚纲和后兽亚纲中MIR序列部分独立起源,这表明位于这些基因座中的MIR具有特定功能的进化保守性。一致地,与背景内含子频率相比,核心MIR序列在防御反应基因中过度代表。
我们的数据表明,基因功能、表达水平和序列保守性影响哺乳动物内含子中转座元件的插入/固定。此外,我们提供了第一份报告,表明特定的转座元件家族在进化上与一个基因功能类别相关。