Zoologisches Forschungsmuseum A, Koenig, Adenauerallee 160, 53113 Bonn, Germany.
Front Zool. 2010 Mar 31;7:10. doi: 10.1186/1742-9994-7-10.
Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective.
ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict.
Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.
对齐掩蔽方法是指在重建树之前排除对齐块的技术,它已成功地提高了序列比对中的信噪比。然而,由于缺乏正式定义的方法来识别序列比对中的随机性,因此无法常规应用对齐掩蔽。在这项研究中,我们比较了最常用的剖析方法(GBLOCKS)与一种新的剖析方法(ALISCORE)的效果,GBLOCKS 方法使用预定义的规则组合与对齐掩蔽,而 ALISCORE 方法基于蒙特卡罗在滑动窗口内进行重采样。我们使用不同的数据和比对方法进行了比较。GBLOCKS 方法排除了超过某个阈值的变量部分,而阈值的选择是任意的,而 ALISCORE 算法没有先验的参数空间评分,因此更客观。
我们成功地将 ALISCORE 扩展到了氨基酸,使用比例模型和经验取代矩阵来对多序列比对中的随机性进行评分。复杂的自举重采样导致随机相似序列的评分均匀分布,以评估观察到的序列相似性的随机性。在真实数据上进行测试性能时,GBLOCKS 和 ALISCORE 这两种掩蔽方法都有助于提高树的分辨率。滑动窗口方法对相同数据集的不同比对不敏感,并且在所有数据集上都表现良好。同时,ALISCORE 能够处理不同的取代模式和异质碱基组成。所有数据集的最佳性能都是由最宽松的 GBLOCKS 间隙参数设置和 ALISCORE 实现的。相应地,邻居网络分析显示冲突减少最多。
对齐掩蔽可以提高系统发生重建前多序列比对中的信噪比。鉴于对齐剖析的稳健性能,应常规使用对齐掩蔽来改善树重建。对齐剖析的参数方法可以很容易地扩展到更复杂的基于似然的序列进化模型,这为进一步的改进提供了可能性。