一种用于在长基因组序列上搜索简单假结的高效比对算法。

An efficient alignment algorithm for searching simple pseudoknots over long genomic sequence.

机构信息

Department of Computer Science, The University of Hong Kong, Rm 301, Chow Yei Ching Building, Pokfulam Road, Hong Kong.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1629-38. doi: 10.1109/TCBB.2012.104.

DOI:10.1109/TCBB.2012.104

Abstract

Structural alignment has been shown to be an effective computational method to identify structural noncoding RNA(ncRNA) candidates as ncRNAs are known to be conserved in secondary structures. However, the complexity of the structural alignment algorithms becomes higher when the structure has pseudoknots. Even for the simplest type of pseudoknots (simple pseudoknots), the fastest algorithm runs in O(mn3) time, where m, n are the length of the query ncRNA (with known structure) and the length of the target sequence (with unknown structure), respectively. In practice, we are usually given a long DNA sequence and we try to locate regions in the sequence for possible candidates of a particular ncRNA. Thus, we need to run the structural alignment algorithm on every possible region in the long sequence. For example, finding candidates for a known ncRNA of length 100 on a sequence of length 50,000, it takes more than one day. In this paper, we provide an efficient algorithm to solve the problem for simple pseudoknots and it is shown to be 10 times faster. The speedup stems from an effective pruning strategy consisting of the computation of a lower bound score for the optimal alignment and an estimation of the maximum score that a candidate can achieve to decide whether to prune the current candidate or not.

摘要

结构比对已被证明是一种有效的计算方法，可以识别结构非编码 RNA(ncRNA)候选物，因为已知 ncRNA 在二级结构中是保守的。然而，当结构具有假结时，结构比对算法的复杂性会更高。即使是最简单类型的假结（简单假结），最快的算法也需要在 O(mn3)时间内运行，其中 m、n 分别是查询 ncRNA（具有已知结构）和目标序列（具有未知结构）的长度。在实践中，我们通常会得到一个长的 DNA 序列，并尝试在序列中找到特定 ncRNA 的可能候选区域。因此，我们需要在长序列的每个可能区域上运行结构比对算法。例如，在长度为 50000 的序列上查找已知长度为 100 的 ncRNA 的候选物，需要一天多的时间。在本文中，我们提供了一种针对简单假结的有效算法，其速度提高了 10 倍。加速源于一种有效的剪枝策略，该策略包括计算最优比对的下界得分和估计候选物可以达到的最大得分，以决定是否剪枝当前候选物。

相似文献

An efficient alignment algorithm for searching simple pseudoknots over long genomic sequence.一种用于在长基因组序列上搜索简单假结的高效比对算法。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1629-38. doi: 10.1109/TCBB.2012.104.

Memory efficient algorithms for structural alignment of RNAs with pseudoknots.具有假结的 RNA 结构比对的内存高效算法。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jan-Feb;9(1):161-8. doi: 10.1109/TCBB.2011.66. Epub 2011 Mar 30.

Chain-RNA: a comparative ncRNA search tool based on the two-dimensional chain algorithm.链 RNA：一种基于二维链算法的比较 ncRNA 搜索工具。

IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):274-85. doi: 10.1109/TCBB.2012.137.

Structural alignment of RNA with complex pseudoknot structure.具有复杂假结结构的RNA的结构比对

J Comput Biol. 2011 Jan;18(1):97-108. doi: 10.1089/cmb.2009.0275.

Efficient known ncRNA search including pseudoknots.高效已知 ncRNA 搜索包括假结。

BMC Bioinformatics. 2013;14 Suppl 2(Suppl 2):S25. doi: 10.1186/1471-2105-14-S2-S25. Epub 2013 Jan 21.

Structural alignment of pseudoknotted RNA.假结RNA的结构比对

J Comput Biol. 2008 Jun;15(5):489-504. doi: 10.1089/cmb.2007.0214.

Memory efficient alignment between RNA sequences and stochastic grammar models of pseudoknots.RNA序列与假结的随机语法模型之间的内存高效比对。

Int J Bioinform Res Appl. 2006;2(3):289-304. doi: 10.1504/IJBRA.2006.010606.

Lifting prediction to alignment of RNA pseudoknots.将RNA假结的预测提升到比对层面。

J Comput Biol. 2010 Mar;17(3):429-42. doi: 10.1089/cmb.2009.0168.

RNASAlign: RNA structural alignment system.RNASAlign：RNA 结构比对系统。

Bioinformatics. 2011 Aug 1;27(15):2151-2. doi: 10.1093/bioinformatics/btr338. Epub 2011 Jun 8.

Structural alignment of RNA with triple helix structure.具有三螺旋结构的RNA的结构比对。

J Comput Biol. 2012 Apr;19(4):365-78. doi: 10.1089/cmb.2010.0052.

一种用于在长基因组序列上搜索简单假结的高效比对算法。

An efficient alignment algorithm for searching simple pseudoknots over long genomic sequence.

机构信息

Department of Computer Science, The University of Hong Kong, Rm 301, Chow Yei Ching Building, Pokfulam Road, Hong Kong.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1629-38. doi: 10.1109/TCBB.2012.104.

DOI:10.1109/TCBB.2012.104

PMID:22848134

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

一种用于在长基因组序列上搜索简单假结的高效比对算法。

An efficient alignment algorithm for searching simple pseudoknots over long genomic sequence.

机构信息

出版信息

相似文献

一种用于在长基因组序列上搜索简单假结的高效比对算法。

An efficient alignment algorithm for searching simple pseudoknots over long genomic sequence.

机构信息

出版信息

相似文献