Holmes I, Bruno W J
Group T10, Los Alamos National Laboratory, NM 87545, USA.
Bioinformatics. 2001 Sep;17(9):803-20. doi: 10.1093/bioinformatics/17.9.803.
We review proposed syntheses of probabilistic sequence alignment, profiling and phylogeny. We develop a multiple alignment algorithm for Bayesian inference in the links model proposed by Thorne et al. (1991, J. Mol. Evol., 33, 114-124). The algorithm, described in detail in Section 3, samples from and/or maximizes the posterior distribution over multiple alignments for any number of DNA or protein sequences, conditioned on a phylogenetic tree. The individual sampling and maximization steps of the algorithm require no more computational resources than pairwise alignment.
We present a software implementation (Handel) of our algorithm and report test results on (i) simulated data sets and (ii) the structurally informed protein alignments of BAliBASE (Thompson et al., 1999, Nucleic Acids Res., 27, 2682-2690).
We find that the mean sum-of-pairs score (a measure of residue-pair correspondence) for the BAliBASE alignments is only 13% lower for Handelthan for CLUSTALW(Thompson et al., 1994, Nucleic Acids Res., 22, 4673-4680), despite the relative simplicity of the links model (CLUSTALW uses affine gap scores and increased penalties for indels in hydrophobic regions). With reference to these benchmarks, we discuss potential improvements to the links model and implications for Bayesian multiple alignment and phylogenetic profiling.
The source code to Handelis freely distributed on the Internet at http://www.biowiki.org/Handel under the terms of the GNU Public License (GPL, 2000, http://www.fsf.org./copyleft/gpl.html).
我们回顾了概率序列比对、谱分析和系统发育的拟议合成方法。我们为Thorne等人(1991年,《分子进化杂志》,33卷,114 - 124页)提出的链接模型开发了一种用于贝叶斯推断的多重比对算法。该算法在第3节中详细描述,它针对任意数量的DNA或蛋白质序列,在系统发育树的条件下,从多重比对的后验分布中进行采样和/或最大化后验分布。该算法的各个采样和最大化步骤所需的计算资源不超过两两比对。
我们展示了我们算法的软件实现(Handel),并报告了在(i)模拟数据集和(ii)BAliBASE(Thompson等人,1999年,《核酸研究》,27卷,2682 - 2690页)的结构信息蛋白质比对上的测试结果。
我们发现,尽管链接模型相对简单(CLUSTALW使用仿射空位得分并对疏水区域的插入缺失增加惩罚),但Handel对BAliBASE比对的平均双残基对得分仅比CLUSTALW(Thompson等人,1994年,《核酸研究》,22卷,4673 - 4680页)低13%。参照这些基准,我们讨论了链接模型的潜在改进以及对贝叶斯多重比对和系统发育谱分析的影响。
Handel的源代码根据GNU公共许可证(GPL,2000年,http://www.fsf.org./copyleft/gpl.html)的条款在互联网上http://www.biowiki.org/Handel免费分发。