Laboratory of Bioinformatics, Big Data and Information Retrieval School, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia.
Sci Rep. 2019 May 10;9(1):7211. doi: 10.1038/s41598-019-43403-3.
The role of 3'-end stem-loops in retrotransposition was experimentally demonstrated for transposons of various species, where LINE-SINE retrotransposons share the same 3'-end sequences, containing a stem-loop. We have discovered that 62-68% of processed pseduogenes and mRNAs also have 3'-end stem-loops. We investigated the properties of 3'-end stem-loops of human L1s, Alus, processed pseudogenes and mRNAs that do not share the same sequences, but all have 3'-end stem-loops. We have built sequence-based and structure-based machine-learning models that are able to recognize 3'-end L1, Alu, processed pseudogene and mRNA stem-loops with high performance. The sequence-based models use only sequence information and capture compositional bias in 3'-ends. The structure-based models consider physical, chemical and geometrical properties of dinucleotides composing a stem and position-specific nucleotide content of a loop and a bulge. The most important parameters include shift, tilt, rise, and hydrophilicity. The obtained results clearly point to the existence of structural constrains for 3'-end stem-loops of L1 and Alu, which are probably important for transposition, and reveal the potential of mRNAs to be recognized by the L1 machinery. The proposed approach is applicable to a broader task of recognizing RNA (DNA) secondary structures. The constructed models are freely available at github ( https://github.com/AlexShein/transposons/ ).
3'端茎环在反转录转座中起作用,这已在各种物种的转座子中得到实验证明,LINE-SINE 反转录转座子具有相同的 3'端序列,包含茎环。我们发现,62-68%的加工假基因和 mRNA 也具有 3'端茎环。我们研究了人类 L1、Alu、加工假基因和不共享相同序列但都具有 3'端茎环的 mRNA 的 3'端茎环的特性。我们构建了基于序列和基于结构的机器学习模型,能够以高性能识别 3'端 L1、Alu、加工假基因和 mRNA 茎环。基于序列的模型仅使用序列信息,并捕获 3'端的组成偏差。基于结构的模型考虑了构成茎的二核苷酸的物理、化学和几何特性以及环和凸起的位置特异性核苷酸含量。最重要的参数包括移位、倾斜、上升和亲水性。所得结果清楚地表明 L1 和 Alu 的 3'端茎环存在结构约束,这可能对转座很重要,并揭示了 mRNA 被 L1 机制识别的潜力。所提出的方法适用于识别 RNA(DNA)二级结构的更广泛任务。构建的模型可在 github(https://github.com/AlexShein/transposons/)上免费获得。