SPARSE：无需基于序列的启发式方法的二次时间RNA同时比对与折叠。

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

作者信息

Will Sebastian, Otto Christina, Miladi Milad, Möhl Mathias, Backofen Rolf

机构信息

Bioinformatics, Department of Computer Science, University of Freiburg, Freiburg, Germany, Bioinformatics, Department of Computer Science, University of Leipzig, Leipzig, Germany.

Bioinformatics, Department of Computer Science, University of Freiburg, Freiburg, Germany.

出版信息

Bioinformatics. 2015 Aug 1;31(15):2489-96. doi: 10.1093/bioinformatics/btv185. Epub 2015 Apr 2.

DOI:10.1093/bioinformatics/btv185

PMID:25838465

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4514930/

Abstract

MOTIVATION

RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time).

RESULTS

Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics.

摘要

动机

RNA测序实验揭示了大量新型非编码RNA。基于同时比对和折叠对其进行分析的金标准存在[公式：见原文]的极端时间复杂度问题。随后，人们提出了许多更快的“桑科夫风格”方法。通常，这些方法的性能依赖于基于序列的启发式算法，将搜索空间限制在最优或接近最优的序列比对上；然而，对于序列同一性低于60%的RNA，基于序列的方法的准确性会下降。像LocARNA这样不需要基于序列启发式算法的比对方法，其时间复杂度被限制在高复杂度（[公式：见原文]四次方时间）。

结果

我们打破了这一障碍，引入了新颖的桑科夫风格算法“基于RNA结构集合的稀疏预测与比对（SPARSE）”，该算法在不使用基于序列启发式算法的情况下以二次时间运行。为了实现这种低复杂度，与序列比对算法相当，SPARSE基于RNA集合的结构特性进行了强大的稀疏化处理。遵循PMcomp，SPARSE通过轻量级能量计算进一步加速。尽管所有现有的轻量级桑科夫风格方法都通过禁止环的删除和插入来限制桑科夫的原始模型，但SPARSE首次将桑科夫算法完全转移到轻量级能量模型中。与LocARNA相比，SPARSE在显著更短的时间内（加速比：3.7）实现了相似的比对和更好的折叠质量。在相似的运行时间下，它比对低序列同一性实例的准确性比使用基于序列启发式算法的RAF高得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c62/4514930/b7f2e1c0ee59/btv185f1p.jpg

相似文献

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.SPARSE：无需基于序列的启发式方法的二次时间RNA同时比对与折叠。

Bioinformatics. 2015 Aug 1;31(15):2489-96. doi: 10.1093/bioinformatics/btv185. Epub 2015 Apr 2.

RNA structural alignments, part I: Sankoff-based approaches for structural alignments.RNA结构比对，第一部分：基于 Sankoff 算法的结构比对方法。

Methods Mol Biol. 2014;1097:275-90. doi: 10.1007/978-1-62703-709-9_13.

Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains.利用马尔可夫链对RNA同时进行比对和折叠时的快速准确结构概率估计。

Algorithms Mol Biol. 2020 Nov 13;15(1):19. doi: 10.1186/s13015-020-00179-w.

ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs.ExpaRNA-P：RNA的同步精确模式匹配与折叠

BMC Bioinformatics. 2014 Dec 31;15(1):404. doi: 10.1186/s12859-014-0404-0.

A faster algorithm for simultaneous alignment and folding of RNA.一种用于RNA同时比对和折叠的更快算法。

J Comput Biol. 2010 Aug;17(8):1051-65. doi: 10.1089/cmb.2009.0197.

LocARNA 2.0: Versatile Simultaneous Alignment and Folding of RNAs.LocARNA 2.0：通用的 RNA 同时对齐和折叠。

Methods Mol Biol. 2024;2726:235-254. doi: 10.1007/978-1-0716-3519-3_10.

Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints.利用序列比对约束进行高效的成对RNA结构预测和比对。

BMC Bioinformatics. 2006 Sep 4;7:400. doi: 10.1186/1471-2105-7-400.

Practicality and time complexity of a sparsified RNA folding algorithm.一种稀疏化RNA折叠算法的实用性和时间复杂度。

J Bioinform Comput Biol. 2012 Apr;10(2):1241007. doi: 10.1142/S0219720012410077.

Murlet: a practical multiple alignment tool for structural RNA sequences.Murlet：一种用于结构RNA序列的实用多序列比对工具。

Bioinformatics. 2007 Jul 1;23(13):1588-98. doi: 10.1093/bioinformatics/btm146. Epub 2007 Apr 25.

CARNA--alignment of RNA structure ensembles.CARNA--RNA 结构集合的对齐。

Nucleic Acids Res. 2012 Jul;40(Web Server issue):W49-53. doi: 10.1093/nar/gks491. Epub 2012 Jun 11.

引用本文的文献

ECSFinder: optimized prediction of evolutionarily conserved RNA secondary structures from genome sequences.ECSFinder：从基因组序列中对进化保守RNA二级结构进行优化预测。

Nucleic Acids Res. 2025 Aug 11;53(15). doi: 10.1093/nar/gkaf780.

REDalign: accurate RNA structural alignment using residual encoder-decoder network.REDalign：使用残差编码器-解码器网络进行精确的 RNA 结构比对。

BMC Bioinformatics. 2024 Nov 5;25(1):346. doi: 10.1186/s12859-024-05956-7.

LocARNA 2.0: Versatile Simultaneous Alignment and Folding of RNAs.LocARNA 2.0：通用的 RNA 同时对齐和折叠。

Methods Mol Biol. 2024;2726:235-254. doi: 10.1007/978-1-0716-3519-3_10.

DEBFold: Computational Identification of RNA Secondary Structures for Sequences across Structural Families Using Deep Learning.DEBFold：使用深度学习对跨结构家族的序列进行 RNA 二级结构的计算识别。

J Chem Inf Model. 2024 May 13;64(9):3756-3766. doi: 10.1021/acs.jcim.4c00458. Epub 2024 Apr 22.

ConsAlign: simultaneous RNA structural aligner based on rich transfer learning and thermodynamic ensemble model of alignment scoring.ConsAlign：基于丰富的迁移学习和对齐评分热力学集合模型的同时 RNA 结构比对工具。

Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad255.

Network-Based Structural Alignment of RNA Sequences Using TOPAS.使用TOPAS对RNA序列进行基于网络的结构比对

Methods Mol Biol. 2023;2586:147-162. doi: 10.1007/978-1-0716-2768-6_9.

SSRTool: A web tool for evaluating RNA secondary structure predictions based on species-specific functional interpretability.SSRTool：一种基于物种特异性功能可解释性评估RNA二级结构预测的网络工具。

Comput Struct Biotechnol J. 2022 May 18;20:2473-2483. doi: 10.1016/j.csbj.2022.05.028. eCollection 2022.

Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning.通过深度表示学习进行RNA结构比对和聚类的信息性RNA碱基嵌入

NAR Genom Bioinform. 2022 Feb 22;4(1):lqac012. doi: 10.1093/nargab/lqac012. eCollection 2022 Mar.

LaRA 2: parallel and vectorized program for sequence-structure alignment of RNA sequences.LaRA 2：用于 RNA 序列序列-结构比对的并行和矢量化程序。

BMC Bioinformatics. 2022 Jan 6;23(1):18. doi: 10.1186/s12859-021-04532-7.

DRAGoM: Classification and Quantification of Noncoding RNA in Metagenomic Data.DRAGoM：宏基因组数据中非编码RNA的分类与定量分析

Front Genet. 2021 May 5;12:669495. doi: 10.3389/fgene.2021.669495. eCollection 2021.

本文引用的文献

ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs.ExpaRNA-P：RNA的同步精确模式匹配与折叠

BMC Bioinformatics. 2014 Dec 31;15(1):404. doi: 10.1186/s12859-014-0404-0.

CARNA--alignment of RNA structure ensembles.CARNA--RNA 结构集合的对齐。

Nucleic Acids Res. 2012 Jul;40(Web Server issue):W49-53. doi: 10.1093/nar/gks491. Epub 2012 Jun 11.

LocARNA-P: accurate boundary prediction and improved detection of structural RNAs.LocARNA-P：准确的边界预测和结构 RNA 的改进检测。

RNA. 2012 May;18(5):900-14. doi: 10.1261/rna.029041.111. Epub 2012 Mar 26.

New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes.通过比较分析脊椎动物基因组鉴定出的人类调控 RNA 结构的新家族。

Genome Res. 2011 Nov;21(11):1929-43. doi: 10.1101/gr.112516.110. Epub 2011 Oct 12.

The reality of pervasive transcription.普遍转录的现实。

PLoS Biol. 2011 Jul;9(7):e1000625; discussion e1001102. doi: 10.1371/journal.pbio.1000625. Epub 2011 Jul 12.

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures.利用序列比对和二级结构的集合进行非编码 RNA 的快速准确聚类。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S48. doi: 10.1186/1471-2105-12-S1-S48.

RNPomics: defining the ncRNA transcriptome by cDNA library generation from ribonucleo-protein particles.RNPomics：通过从核糖核蛋白颗粒中生成 cDNA 文库来定义 ncRNA 转录组。

Nucleic Acids Res. 2010 Jun;38(10):e113. doi: 10.1093/nar/gkq057. Epub 2010 Feb 11.

The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs.复杂的真核转录组：意想不到的广泛转录和新型小RNA

Nat Rev Genet. 2009 Dec;10(12):833-44. doi: 10.1038/nrg2683.

Metatranscriptomics reveals unique microbial small RNAs in the ocean's water column.宏转录组学揭示了海洋水柱中独特的微生物小RNA。

Nature. 2009 May 14;459(7244):266-9. doi: 10.1038/nature08055.

Finding non-coding RNAs through genome-scale clustering.通过全基因组规模聚类寻找非编码RNA。

J Bioinform Comput Biol. 2009 Apr;7(2):373-88. doi: 10.1142/s0219720009004126.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SPARSE：无需基于序列的启发式方法的二次时间RNA同时比对与折叠。

SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献