Höchsmann Matthias, Töller Thomas, Giegerich Robert, Kurtz Stefan
International Graduate School in Bioinformatics and Genome Research, University of Bielefeld, 33501 Bielefeld, Germany.
Proc IEEE Comput Soc Bioinform Conf. 2003;2:159-68.
We present a systematic treatment of alignment distance and local similarity algorithms on trees and forests. We build upon the tree alignment algorithm for ordered trees given by Jiang et. al (1995) and extend it to calculate local forest alignments, which is essential for finding local similar regions in RNA secondary structures. The time complexity of our algorithm is O(|F(1)| |F(2) deg(F(1)) deg(F(2)) (deg(F(1)) + deg(F(2))) where |F(i)| is the number of nodes in forest F(i) and deg (F(i)) is the degree of F(i). We provide carefully engineered dynamic programming implementations using dense, two-dimensional tables which considerably reduces the space requirement. We suggest a new representation of RNA secondary structures as forests that allow reasonable scoring of edit operations on RNA secondary structures. The comparison of RNA secondary structures is facilitated by a new visualization technique for RNA secondary structure alignments. Finally, we show how potential regulatory motifs can be discovered solely by their structural preservation, and independent of their sequence conservation and position.
我们对树和森林上的比对距离和局部相似性算法进行了系统的处理。我们基于Jiang等人(1995年)给出的有序树的树比对算法,并将其扩展以计算局部森林比对,这对于在RNA二级结构中找到局部相似区域至关重要。我们算法的时间复杂度为O(|F(1)| |F(2)| deg(F(1)) deg(F(2)) (deg(F(1)) + deg(F(2))),其中|F(i)|是森林F(i)中的节点数,deg(F(i))是F(i)的度。我们使用密集的二维表精心设计了动态规划实现,这大大降低了空间需求。我们提出了一种将RNA二级结构表示为森林的新方法,这使得对RNA二级结构上的编辑操作进行合理评分成为可能。一种用于RNA二级结构比对的新可视化技术促进了RNA二级结构的比较。最后,我们展示了如何仅通过潜在调控基序的结构保留来发现它们,而不依赖于它们的序列保守性和位置。