Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
Mol Biol Evol. 2010 Aug;27(8):1759-67. doi: 10.1093/molbev/msq066. Epub 2010 Mar 5.
Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the "GUIDe tree based AligNment ConfidencE" (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions.
多序列比对 (MSA) 是从分子系统发育学到 3D 结构预测等各种比较序列分析的基础。已经开发出了用于序列比对的复杂算法,但实际上,可能会出现许多错误,并且 MSA 的很大一部分是不可靠的。因此,理解和描述 MSA 中各种错误源并量化特定位置的比对置信度至关重要。在本文中,我们表明渐进比对方法中使用的引导树的不确定性是比对不确定性的主要来源。我们利用这一见解开发了一种新方法来量化每个比对列对引导树不确定性的稳健性。我们基于广泛使用的对系统发育树进行扰动的自举方法。具体来说,我们生成一组树,并将每个树用作比对算法中的引导树,从而生成一组 MSA。接下来,我们测试从未扰动引导树获得的 MSA 中每列相对于 MSA 集合的一致性。我们将此度量命名为“基于 GUIDE 树的对齐置信度”(GUIDANCE)评分。使用基准比对数据 BASE 基准以及模拟研究,我们表明 GUIDANCE 评分可以准确识别 MSA 中的错误。此外,我们将结果与之前发表的 Heads-or-Tails 评分进行比较,并表明 GUIDANCE 评分是不可靠比对区域的更好预测指标。