Landan Giddy, Graur Dan
Department of Biology & Biochemistry, University of Houston, Houston, TX 77204, USA.
Pac Symp Biocomput. 2008:15-24.
The question of multiple sequence alignment quality has received much attention from developers of alignment methods. Less forthcoming, however, are practical measures for quantifying alignment reliability in real life settings. Here, we present a method to identify and quantify uncertainties in multiple sequence alignments. The proposed method is based upon the observation that under any objective function or evolutionary model, some portions of reconstructed alignments are uniquely optimal, while other parts constitute an arbitrary choice from a set of co-optimal alternatives. The co-optimal portions of reconstructed alignments are, thus, at most half as reliable as the uniquely optimal portions. For pairwise alignments, this irreducible uncertainty can be quantified by the comparison of the high-road and low-road alignments, which form the cooptimality envelope for the two sequences. We extend this approach for the case of progressive multiple sequence alignment by forming a large set of equally likely co-optimal alignments that bracket the co-optimality space. This set can, then, be used to derive a series of local reliability measures for any candidate alignment. The resulting reliability measures can be used as predictors and classifiers of alignment errors. We report a simulation study that demonstrates the superior power of the proposed local reliability measures.
多重序列比对质量的问题已经受到比对方法开发者的广泛关注。然而,在实际应用中,量化比对可靠性的实用方法却鲜有出现。在此,我们提出一种方法来识别和量化多重序列比对中的不确定性。该方法基于这样的观察:在任何目标函数或进化模型下,重建比对的某些部分是唯一最优的,而其他部分则是从一组共同最优的替代方案中任意选择的。因此,重建比对中共同最优部分的可靠性至多是唯一最优部分的一半。对于双序列比对,这种不可减少的不确定性可以通过比较高路比对和低路比对来量化,这两者构成了两个序列的共同最优包络。我们通过形成一组围绕共同最优空间的等可能共同最优比对,将这种方法扩展到渐进多重序列比对的情况。然后,这个集合可用于为任何候选比对推导一系列局部可靠性度量。所得的可靠性度量可作为比对错误的预测器和分类器。我们报告了一项模拟研究,该研究证明了所提出的局部可靠性度量的卓越能力。