在未比对的RNA序列中发现常见的茎环基序。

Discovering common stem-loop motifs in unaligned RNA sequences.

作者信息

Gorodkin J, Stricklin S L, Stormo G D

机构信息

Department of Genetics and Ecology, The Institute of Biological Sciences, University of Aarhus, Building 540, Ny Munkegade, DK-8000 Aarhus C, Denmark.

出版信息

Nucleic Acids Res. 2001 May 15;29(10):2135-44. doi: 10.1093/nar/29.10.2135.

DOI:10.1093/nar/29.10.2135

PMID:11353083

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC55461/

Abstract

Post-transcriptional regulation of gene expression is often accomplished by proteins binding to specific sequence motifs in mRNA molecules, to affect their translation or stability. The motifs are often composed of a combination of sequence and structural constraints such that the overall structure is preserved even though much of the primary sequence is variable. While several methods exist to discover transcriptional regulatory sites in the DNA sequences of coregulated genes, the RNA motif discovery problem is much more difficult because of covariation in the positions. We describe the combined use of two approaches for RNA structure prediction, FOLDALIGN and COVE, that together can discover and model stem-loop RNA motifs in unaligned sequences, such as UTRs from post-transcriptionally coregulated genes. We evaluate the method on two datasets, one a section of rRNA genes with randomly truncated ends so that a global alignment is not possible, and the other a hyper-variable collection of IRE-like elements that were inserted into randomized UTR sequences. In both cases the combined method identified the motifs correctly, and in the rRNA example we show that it is capable of determining the structure, which includes bulge and internal loops as well as a variable length hairpin loop. Those automated results are quantitatively evaluated and found to agree closely with structures contained in curated databases, with correlation coefficients up to 0.9. A basic server, Stem-Loop Align SearcH (SLASH), which will perform stem-loop searches in unaligned RNA sequences, is available at http://www.bioinf.au.dk/slash/.

摘要

基因表达的转录后调控通常是通过蛋白质与mRNA分子中的特定序列基序结合来实现的，从而影响其翻译或稳定性。这些基序通常由序列和结构限制的组合构成，这样即使大部分一级序列是可变的，整体结构仍能得以保留。虽然存在多种方法可用于发现共调控基因DNA序列中的转录调控位点，但由于位置上的共变，RNA基序发现问题要困难得多。我们描述了两种RNA结构预测方法FOLDALIGN和COVE的联合使用，它们共同能够在未比对的序列中发现并模拟茎环RNA基序，比如来自转录后共调控基因的非编码区。我们在两个数据集上评估了该方法，一个是rRNA基因的一部分，其末端随机截断，因此无法进行全局比对，另一个是插入到随机非编码区序列中的类似IRE元件的高变集合。在这两种情况下，联合方法都能正确识别基序，在rRNA的例子中，我们表明它能够确定结构，该结构包括凸起环和内环以及可变长度的发夹环。对这些自动化结果进行了定量评估，发现与经过整理的数据库中包含的结构密切相符，相关系数高达0.9。一个基本服务器，即茎环比对搜索（SLASH），可在http://www.bioinf.au.dk/slash/ 进行未比对RNA序列中的茎环搜索。

相似文献

Discovering common stem-loop motifs in unaligned RNA sequences.在未比对的RNA序列中发现常见的茎环基序。

Nucleic Acids Res. 2001 May 15;29(10):2135-44. doi: 10.1093/nar/29.10.2135.

A mini-greedy algorithm for faster structural RNA stem-loop search.一种用于更快的结构RNA茎环搜索的迷你贪心算法。

Genome Inform. 2001;12:184-93.

RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences.RNAProfile：一种用于在未比对的RNA序列中寻找保守二级结构基序的算法。

Nucleic Acids Res. 2004 Jun 15;32(10):3258-69. doi: 10.1093/nar/gkh650. Print 2004.

A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences.一种用于预测未比对序列中包括假结在内的常见RNA二级结构基序的图论方法。

Bioinformatics. 2004 Jul 10;20(10):1591-602. doi: 10.1093/bioinformatics/bth131. Epub 2004 Feb 12.

Archaeal homologs of eukaryotic methylation guide small nucleolar RNAs: lessons from the Pyrococcus genomes.真核生物甲基化引导小核仁RNA的古菌同源物：来自嗜热栖热菌基因组的经验教训。

J Mol Biol. 2000 Apr 7;297(4):895-906. doi: 10.1006/jmbi.2000.3593.

RNomics in Archaea reveals a further link between splicing of archaeal introns and rRNA processing.古菌中的核糖核酸组学揭示了古菌内含子剪接与核糖体RNA加工之间的进一步联系。

Nucleic Acids Res. 2002 Feb 15;30(4):921-30. doi: 10.1093/nar/30.4.921.

Motif prediction in ribosomal RNAs Lessons and prospects for automated motif prediction in homologous RNA molecules.核糖体RNA中的基序预测同源RNA分子中自动基序预测的经验与前景

Biochimie. 2002 Sep;84(9):961-73. doi: 10.1016/s0300-9084(02)01463-3.

Prediction of consensus structural motifs in a family of coregulated RNA sequences.共调控RNA序列家族中共识结构基序的预测

Nucleic Acids Res. 2002 Sep 1;30(17):3886-93. doi: 10.1093/nar/gkf485.

RADAR: a web server for RNA data analysis and research.RADAR：一个用于RNA数据分析与研究的网络服务器。

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W300-4. doi: 10.1093/nar/gkm253. Epub 2007 May 21.

Consensus folding of unaligned RNA sequences revisited.重新审视未比对RNA序列的一致性折叠

J Comput Biol. 2006 Mar;13(2):283-95. doi: 10.1089/cmb.2006.13.283.

引用本文的文献

Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction.用于三级RNA结构预测的深度学习方法的系统基准测试。

PLoS Comput Biol. 2024 Dec 30;20(12):e1012715. doi: 10.1371/journal.pcbi.1012715. eCollection 2024 Dec.

Comparative RNA Genomics.比较 RNA 基因组学。

Methods Mol Biol. 2024;2802:347-393. doi: 10.1007/978-1-0716-3838-5_12.

Assessment of three-dimensional RNA structure prediction in CASP15.评估在 CASP15 中三维 RNA 结构预测。

Proteins. 2023 Dec;91(12):1747-1770. doi: 10.1002/prot.26602. Epub 2023 Oct 24.

LinearCoFold and LinearCoPartition: linear-time algorithms for secondary structure prediction of interacting RNA molecules.线性 CoFold 和线性 CoPartition：用于预测相互作用 RNA 分子二级结构的线性时间算法。

Nucleic Acids Res. 2023 Oct 13;51(18):e94. doi: 10.1093/nar/gkad664.

Assessment of three-dimensional RNA structure prediction in CASP15.在蛋白质结构预测关键评估第15轮（CASP15）中对三维RNA结构预测的评估

bioRxiv. 2023 Oct 3:2023.04.25.538330. doi: 10.1101/2023.04.25.538330.

Fitness functions for RNA structure design.RNA 结构设计的适应度函数。

Nucleic Acids Res. 2023 Apr 24;51(7):e40. doi: 10.1093/nar/gkad097.

Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains.利用马尔可夫链对RNA同时进行比对和折叠时的快速准确结构概率估计。

Algorithms Mol Biol. 2020 Nov 13;15(1):19. doi: 10.1186/s13015-020-00179-w.

IPANEMAP: integrative probing analysis of nucleic acids empowered by multiple accessibility profiles.IPANEMAP：基于多种可及性图谱的核酸综合探测分析。

Nucleic Acids Res. 2020 Sep 4;48(15):8276-8289. doi: 10.1093/nar/gkaa607.

The locality dilemma of Sankoff-like RNA alignments.Sankoff 型 RNA 比对的局部困境。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i242-i250. doi: 10.1093/bioinformatics/btaa431.

PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures.通过结构数据特征进行全转录组范围内的功能性 RNA 元件搜索。

Genome Biol. 2018 Mar 1;19(1):28. doi: 10.1186/s13059-018-1399-z.

本文引用的文献

An algorithm for statistical alignment of sequences related by a binary tree.一种用于二叉树相关序列统计比对的算法。

Pac Symp Biocomput. 2001:179-90. doi: 10.1142/9789814447362_0019.

Phylogenetically enhanced statistical tools for RNA structure prediction.用于RNA结构预测的系统发育增强型统计工具。

Bioinformatics. 2000 Jun;16(6):501-12. doi: 10.1093/bioinformatics/16.6.501.

Small subunit ribosomal RNA modeling using stochastic context-free grammars.使用随机上下文无关文法的小亚基核糖体RNA建模

Proc Int Conf Intell Syst Mol Biol. 2000;8:57-66.

Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition?蛋白质 - DNA 界面的几何分析与比较：为何不存在简单的识别密码？

J Mol Biol. 2000 Aug 18;301(3):597-624. doi: 10.1006/jmbi.2000.3918.

Statistical alignment: computational properties, homology testing and goodness-of-fit.统计比对：计算属性、同源性检测与拟合优度

J Mol Biol. 2000 Sep 8;302(1):265-79. doi: 10.1006/jmbi.2000.4061.

Comparative sequence analysis and patterns of covariation in RNA secondary structures.RNA二级结构中的比较序列分析与共变模式。

Genetics. 2000 Feb;154(2):909-21. doi: 10.1093/genetics/154.2.909.

Prediction of common secondary structures of RNAs: a genetic algorithm approach.RNA常见二级结构的预测：一种遗传算法方法。

Nucleic Acids Res. 2000 Feb 15;28(4):991-9. doi: 10.1093/nar/28.4.991.

UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs.UTRdb和UTRsite：真核生物mRNA 5'和3'非翻译区的序列和功能元件的专门数据库。

Nucleic Acids Res. 2000 Jan 1;28(1):193-6. doi: 10.1093/nar/28.1.193.

The European small subunit ribosomal RNA database.欧洲小亚基核糖体RNA数据库。

Nucleic Acids Res. 2000 Jan 1;28(1):175-6. doi: 10.1093/nar/28.1.175.

The RDP (Ribosomal Database Project) continues.核糖体数据库项目（RDP）仍在继续。

Nucleic Acids Res. 2000 Jan 1;28(1):173-4. doi: 10.1093/nar/28.1.173.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验