• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用基于序列和基于结构的机器学习模型识别人类基因组中的 3'-端 L1、Alu、加工假基因和 mRNA 茎环。

Recognition of 3'-end L1, Alu, processed pseudogenes, and mRNA stem-loops in the human genome using sequence-based and structure-based machine-learning models.

机构信息

Laboratory of Bioinformatics, Big Data and Information Retrieval School, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia.

出版信息

Sci Rep. 2019 May 10;9(1):7211. doi: 10.1038/s41598-019-43403-3.

DOI:10.1038/s41598-019-43403-3
PMID:31076573
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6510757/
Abstract

The role of 3'-end stem-loops in retrotransposition was experimentally demonstrated for transposons of various species, where LINE-SINE retrotransposons share the same 3'-end sequences, containing a stem-loop. We have discovered that 62-68% of processed pseduogenes and mRNAs also have 3'-end stem-loops. We investigated the properties of 3'-end stem-loops of human L1s, Alus, processed pseudogenes and mRNAs that do not share the same sequences, but all have 3'-end stem-loops. We have built sequence-based and structure-based machine-learning models that are able to recognize 3'-end L1, Alu, processed pseudogene and mRNA stem-loops with high performance. The sequence-based models use only sequence information and capture compositional bias in 3'-ends. The structure-based models consider physical, chemical and geometrical properties of dinucleotides composing a stem and position-specific nucleotide content of a loop and a bulge. The most important parameters include shift, tilt, rise, and hydrophilicity. The obtained results clearly point to the existence of structural constrains for 3'-end stem-loops of L1 and Alu, which are probably important for transposition, and reveal the potential of mRNAs to be recognized by the L1 machinery. The proposed approach is applicable to a broader task of recognizing RNA (DNA) secondary structures. The constructed models are freely available at github ( https://github.com/AlexShein/transposons/ ).

摘要

3'端茎环在反转录转座中起作用,这已在各种物种的转座子中得到实验证明,LINE-SINE 反转录转座子具有相同的 3'端序列,包含茎环。我们发现,62-68%的加工假基因和 mRNA 也具有 3'端茎环。我们研究了人类 L1、Alu、加工假基因和不共享相同序列但都具有 3'端茎环的 mRNA 的 3'端茎环的特性。我们构建了基于序列和基于结构的机器学习模型,能够以高性能识别 3'端 L1、Alu、加工假基因和 mRNA 茎环。基于序列的模型仅使用序列信息,并捕获 3'端的组成偏差。基于结构的模型考虑了构成茎的二核苷酸的物理、化学和几何特性以及环和凸起的位置特异性核苷酸含量。最重要的参数包括移位、倾斜、上升和亲水性。所得结果清楚地表明 L1 和 Alu 的 3'端茎环存在结构约束,这可能对转座很重要,并揭示了 mRNA 被 L1 机制识别的潜力。所提出的方法适用于识别 RNA(DNA)二级结构的更广泛任务。构建的模型可在 github(https://github.com/AlexShein/transposons/)上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/beb1cbbc01d8/41598_2019_43403_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/925ea171f9de/41598_2019_43403_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/fb72cd0c32e2/41598_2019_43403_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/b37650126467/41598_2019_43403_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/3cdfd8f79534/41598_2019_43403_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/c13bf2cefaf2/41598_2019_43403_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/2178383c318f/41598_2019_43403_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/ae2c28e4cd47/41598_2019_43403_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/c053e644c9d1/41598_2019_43403_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/80efbb55aad8/41598_2019_43403_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/c113b1e84721/41598_2019_43403_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/beb1cbbc01d8/41598_2019_43403_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/925ea171f9de/41598_2019_43403_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/fb72cd0c32e2/41598_2019_43403_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/b37650126467/41598_2019_43403_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/3cdfd8f79534/41598_2019_43403_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/c13bf2cefaf2/41598_2019_43403_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/2178383c318f/41598_2019_43403_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/ae2c28e4cd47/41598_2019_43403_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/c053e644c9d1/41598_2019_43403_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/80efbb55aad8/41598_2019_43403_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/c113b1e84721/41598_2019_43403_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d6a/6510757/beb1cbbc01d8/41598_2019_43403_Fig11_HTML.jpg

相似文献

1
Recognition of 3'-end L1, Alu, processed pseudogenes, and mRNA stem-loops in the human genome using sequence-based and structure-based machine-learning models.利用基于序列和基于结构的机器学习模型识别人类基因组中的 3'-端 L1、Alu、加工假基因和 mRNA 茎环。
Sci Rep. 2019 May 10;9(1):7211. doi: 10.1038/s41598-019-43403-3.
2
Conserved 3' UTR stem-loop structure in L1 and Alu transposons in human genome: possible role in retrotransposition.人类基因组中L1和Alu转座子保守的3'UTR茎环结构:在逆转录转座中的可能作用。
BMC Genomics. 2016 Dec 3;17(1):992. doi: 10.1186/s12864-016-3344-4.
3
Novel Role of 3'UTR-Embedded Alu Elements as Facilitators of Processed Pseudogene Genesis and Host Gene Capture by Viral Genomes.3'非翻译区(3'UTR)嵌入的Alu元件作为加工假基因产生和病毒基因组捕获宿主基因的促进因子的新作用。
PLoS One. 2016 Dec 29;11(12):e0169196. doi: 10.1371/journal.pone.0169196. eCollection 2016.
4
Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates.全基因组筛选表明,在灵长类祖先中,特定的L1亚家族可能导致加工假基因和Alu重复序列的形成出现爆发。
Genome Biol. 2003;4(11):R74. doi: 10.1186/gb-2003-4-11-r74. Epub 2003 Oct 28.
5
Active human retrotransposons: variation and disease.活跃的人类反转录转座子:变异与疾病。
Curr Opin Genet Dev. 2012 Jun;22(3):191-203. doi: 10.1016/j.gde.2012.02.006. Epub 2012 Mar 8.
6
Molecular reconstruction of extinct LINE-1 elements and their interaction with nonautonomous elements.已灭绝 LINE-1 元件的分子重构及其与非自主元件的相互作用。
Mol Biol Evol. 2013 Jan;30(1):88-99. doi: 10.1093/molbev/mss202. Epub 2012 Aug 23.
7
Analysis of 5' junctions of human LINE-1 and Alu retrotransposons suggests an alternative model for 5'-end attachment requiring microhomology-mediated end-joining.对人类LINE-1和Alu逆转录转座子5'端连接的分析表明,存在一种需要微同源性介导的末端连接的5'端附着替代模型。
Genome Res. 2005 Jun;15(6):780-9. doi: 10.1101/gr.3421505.
8
Selective inhibition of Alu retrotransposition by APOBEC3G.APOBEC3G对Alu逆转录转座的选择性抑制作用。
Gene. 2007 Apr 1;390(1-2):199-205. doi: 10.1016/j.gene.2006.08.032. Epub 2006 Sep 27.
9
Retroposition of processed pseudogenes: the impact of RNA stability and translational control.加工假基因的反转录转座:RNA稳定性和翻译控制的影响
Trends Genet. 2006 Feb;22(2):69-73. doi: 10.1016/j.tig.2005.11.005. Epub 2005 Dec 13.
10
Length distribution of long interspersed nucleotide elements (LINEs) and processed pseudogenes of human endogenous retroviruses: implications for retrotransposition and pseudogene detection.人类内源性逆转录病毒的长散在核苷酸元件(LINEs)和加工假基因的长度分布:对逆转座和假基因检测的意义
Gene. 2002 Oct 30;300(1-2):189-94. doi: 10.1016/s0378-1119(02)01047-8.

本文引用的文献

1
A deep neural network approach for learning intrinsic protein-RNA binding preferences.一种用于学习内在蛋白-RNA 结合偏好的深度神经网络方法。
Bioinformatics. 2018 Sep 1;34(17):i638-i646. doi: 10.1093/bioinformatics/bty600.
2
Convolutional neural networks for classification of alignments of non-coding RNA sequences.卷积神经网络在非编码 RNA 序列比对分类中的应用。
Bioinformatics. 2018 Jul 1;34(13):i237-i244. doi: 10.1093/bioinformatics/bty228.
3
Evolutionary plasticity of the NHL domain underlies distinct solutions to RNA recognition.
NHL 结构域的进化可塑性为 RNA 识别提供了独特的解决方案。
Nat Commun. 2018 Apr 19;9(1):1549. doi: 10.1038/s41467-018-03920-7.
4
Machine learning model for sequence-driven DNA G-quadruplex formation.用于序列驱动的 DNA G-四链体形成的机器学习模型。
Sci Rep. 2017 Nov 6;7(1):14535. doi: 10.1038/s41598-017-14017-4.
5
2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function.2L-piRNA:一种用于识别Piwi相互作用RNA及其功能的双层集成分类器。
Mol Ther Nucleic Acids. 2017 Jun 16;7:267-277. doi: 10.1016/j.omtn.2017.04.008. Epub 2017 Apr 13.
6
Solution structure of a reverse transcriptase recognition site of a LINE RNA from zebrafish.斑马鱼LINE RNA逆转录酶识别位点的溶液结构
J Biochem. 2017 Oct 1;162(4):279-285. doi: 10.1093/jb/mvx026.
7
Conserved 3' UTR stem-loop structure in L1 and Alu transposons in human genome: possible role in retrotransposition.人类基因组中L1和Alu转座子保守的3'UTR茎环结构:在逆转录转座中的可能作用。
BMC Genomics. 2016 Dec 3;17(1):992. doi: 10.1186/s12864-016-3344-4.
8
PAI: Predicting adenosine to inosine editing sites by using pseudo nucleotide compositions.PAI:利用伪核苷酸组成预测腺苷到肌苷的编辑位点。
Sci Rep. 2016 Oct 11;6:35123. doi: 10.1038/srep35123.
9
The Crystal Structure of the NHL Domain in Complex with RNA Reveals the Molecular Basis of Drosophila Brain-Tumor-Mediated Gene Regulation.NHL 结构域与 RNA 复合物的晶体结构揭示了果蝇脑肿瘤介导的基因调控的分子基础。
Cell Rep. 2015 Nov 10;13(6):1206-1220. doi: 10.1016/j.celrep.2015.09.068. Epub 2015 Oct 29.
10
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.通过深度学习预测 DNA 和 RNA 结合蛋白的序列特异性。
Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.