Suppr超能文献

利用基于序列和基于结构的机器学习模型识别人类基因组中的 3'-端 L1、Alu、加工假基因和 mRNA 茎环。

Recognition of 3'-end L1, Alu, processed pseudogenes, and mRNA stem-loops in the human genome using sequence-based and structure-based machine-learning models.

机构信息

Laboratory of Bioinformatics, Big Data and Information Retrieval School, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia.

出版信息

Sci Rep. 2019 May 10;9(1):7211. doi: 10.1038/s41598-019-43403-3.

Abstract

The role of 3'-end stem-loops in retrotransposition was experimentally demonstrated for transposons of various species, where LINE-SINE retrotransposons share the same 3'-end sequences, containing a stem-loop. We have discovered that 62-68% of processed pseduogenes and mRNAs also have 3'-end stem-loops. We investigated the properties of 3'-end stem-loops of human L1s, Alus, processed pseudogenes and mRNAs that do not share the same sequences, but all have 3'-end stem-loops. We have built sequence-based and structure-based machine-learning models that are able to recognize 3'-end L1, Alu, processed pseudogene and mRNA stem-loops with high performance. The sequence-based models use only sequence information and capture compositional bias in 3'-ends. The structure-based models consider physical, chemical and geometrical properties of dinucleotides composing a stem and position-specific nucleotide content of a loop and a bulge. The most important parameters include shift, tilt, rise, and hydrophilicity. The obtained results clearly point to the existence of structural constrains for 3'-end stem-loops of L1 and Alu, which are probably important for transposition, and reveal the potential of mRNAs to be recognized by the L1 machinery. The proposed approach is applicable to a broader task of recognizing RNA (DNA) secondary structures. The constructed models are freely available at github ( https://github.com/AlexShein/transposons/ ).

摘要

3'端茎环在反转录转座中起作用,这已在各种物种的转座子中得到实验证明,LINE-SINE 反转录转座子具有相同的 3'端序列,包含茎环。我们发现,62-68%的加工假基因和 mRNA 也具有 3'端茎环。我们研究了人类 L1、Alu、加工假基因和不共享相同序列但都具有 3'端茎环的 mRNA 的 3'端茎环的特性。我们构建了基于序列和基于结构的机器学习模型,能够以高性能识别 3'端 L1、Alu、加工假基因和 mRNA 茎环。基于序列的模型仅使用序列信息,并捕获 3'端的组成偏差。基于结构的模型考虑了构成茎的二核苷酸的物理、化学和几何特性以及环和凸起的位置特异性核苷酸含量。最重要的参数包括移位、倾斜、上升和亲水性。所得结果清楚地表明 L1 和 Alu 的 3'端茎环存在结构约束,这可能对转座很重要,并揭示了 mRNA 被 L1 机制识别的潜力。所提出的方法适用于识别 RNA(DNA)二级结构的更广泛任务。构建的模型可在 github(https://github.com/AlexShein/transposons/)上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/925ea171f9de/41598_2019_43403_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验