Zhu J, Liu J S, Lawrence C E
Wadsworth Center for Laboratories and Research, Albany, NY, USA.
Bioinformatics. 1998;14(1):25-39. doi: 10.1093/bioinformatics/14.1.25.
The selection of a scoring matrix and gap penalty parameters continues to be an important problem in sequence alignment. We describe here an algorithm, the 'Bayes block aligner, which bypasses this requirement. Instead of requiring a fixed set of parameter settings, this algorithm returns the Bayesian posterior probability for the number of gaps and for the scoring matrices in any series of interest. Furthermore, instead of returning the single best alignment for the chosen parameter settings, this algorithm returns the posterior distribution of all alignments considering the full range of gapping and scoring matrices selected, weighing each in proportion to its probability based on the data. We compared the Bayes aligner with the popular Smith-Waterman algorithm with parameter settings from the literature which had been optimized for the identification of structural neighbors, and found that the Bayes aligner correctly identified more structural neighbors. In a detailed examination of the alignment of a pair of kinase and a pair of GTPase sequences, we illustrate the algorithm's potential to identify subsequences that are conserved to different degrees. In addition, this example shows that the Bayes aligner returns an alignment-free assessment of the distance between a pair of sequences.
在序列比对中,评分矩阵和空位罚分参数的选择仍然是一个重要问题。我们在此描述一种算法,即“贝叶斯块比对器”,它绕过了这一要求。该算法不要求固定的参数设置集,而是返回任意感兴趣序列系列中空位数量和评分矩阵的贝叶斯后验概率。此外,该算法不是返回所选参数设置下的单一最佳比对结果,而是返回考虑所选空位和评分矩阵全范围的所有比对结果的后验分布,并根据数据按其概率比例对每个结果进行加权。我们将贝叶斯比对器与文献中针对识别结构邻居进行了优化的流行的史密斯-沃特曼算法进行了比较,发现贝叶斯比对器能正确识别更多的结构邻居。在对一对激酶序列和一对GTP酶序列的比对进行详细研究时,我们展示了该算法识别不同程度保守子序列的潜力。此外,这个例子表明贝叶斯比对器返回了一对序列之间距离的无比对评估。