Webb Bobbie-Jo M, Liu Jun S, Lawrence Charles E
The Wadsworth Center for Laboratories and Research, New York State Department of Health, Albany, NY 12201, USA.
Nucleic Acids Res. 2002 Mar 1;30(5):1268-77. doi: 10.1093/nar/30.5.1268.
The Smith-Waterman algorithm yields a single alignment, which, albeit optimal, can be strongly affected by the choice of the scoring matrix and the gap penalties. Additionally, the scores obtained are dependent upon the lengths of the aligned sequences, requiring a post-analysis conversion. To overcome some of these shortcomings, we developed a Bayesian algorithm for local sequence alignment (BALSA), that takes into account the uncertainty associated with all unknown variables by incorporating in its forward sums a series of scoring matrices, gap parameters and all possible alignments. The algorithm can return both the joint and the marginal optimal alignments, samples of alignments drawn from the posterior distribution and the posterior probabilities of gap penalties and scoring matrices. Furthermore, it automatically adjusts for variations in sequence lengths. BALSA was compared with SSEARCH, to date the best performing dynamic programming algorithm in the detection of structural neighbors. Using the SCOP databases PDB40D-B and PDB90D-B, BALSA detected 19.8 and 41.3% of remote homologs whereas SSEARCH detected 18.4 and 38% at an error rate of 1% errors per query over the databases, respectively.
史密斯-沃特曼算法会生成一个单一比对结果,尽管它是最优的,但会受到评分矩阵和空位罚分选择的强烈影响。此外,所获得的分数取决于比对序列的长度,这需要进行后期分析转换。为了克服其中一些缺点,我们开发了一种用于局部序列比对的贝叶斯算法(BALSA),该算法通过在其前向求和中纳入一系列评分矩阵、空位参数和所有可能的比对,考虑了与所有未知变量相关的不确定性。该算法可以返回联合最优比对和边际最优比对、从后验分布中抽取的比对样本以及空位罚分和评分矩阵的后验概率。此外,它会自动调整序列长度的变化。将BALSA与SSEARCH进行了比较,SSEARCH是目前在检测结构邻域方面性能最佳的动态规划算法。使用SCOP数据库PDB40D - B和PDB90D - B,在数据库上每个查询的错误率为1%的情况下,BALSA分别检测到19.8%和41.3%的远源同源物,而SSEARCH分别检测到18.4%和38%。