Di Pietro C, Di Pietro V, Emmanuele G, Ferro A, Maugeri T, Modica E, Pigola G, Pulvirenti A, Purrello M, Ragusa M, Scalia M, Shasha D, Travali S, Zimmitti V
Dipartimento di Scienze Biomediche, Università di Catania.
Proc IEEE Comput Soc Bioinform Conf. 2003;2:326-36.
In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.
在本文中,我们提出了一种名为AntiClusAl的新的多序列比对(MSA)算法。该方法利用了一种常用的思路,即对属于由某种聚类算法生成的类别的同源序列进行比对,然后沿着合适的树结构以自底向上的方式继续比对过程。最终结果在树的根节点处读取。每个聚类中的多序列比对利用与聚类的1-中位数(中心)的渐进比对。序列集合S的1-中位数是S中的元素,它使与S中任何其他序列的平均距离最小化。其精确计算需要二次时间。我们提出的算法的基本思想是利用一种基于随机锦标赛的简单自然的算法技术,该技术已成功应用于一般度量空间中的大规模搜索问题。特别地,使用了一种名为反极树的聚类算法和一种近似线性的1-中位数计算。我们的算法与广泛用于多序列比对的工具Clustal W相比,在比对质量完全可比的情况下,显示出更好的运行时间结果。还引用了一个成功的生物学应用,该应用显示了非洲爪蟾SOD2在进化过程中的高氨基酸保守性。