Holmes I, Durbin R
Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, England. ihh,
J Comput Biol. 1998 Fall;5(3):493-504. doi: 10.1089/cmb.1998.5.493.
Algorithms for generating alignments of biological sequences have inherent statistical limitations when it comes to the accuracy of the alignments they produce. Using simulations, we measure the accuracy of the standard global dynamic programming method and show that it can be reasonably well modelled by an "edge wander" approximation to the distribution of the optimal scoring path around the correct path in the vicinity of a gap. We also give a table from which accuracy values can be predicted for commonly used scoring schemes and sequence divergences (the PAM and BLOSUM series). Finally we describe how to calculate the expected accuracy of a given alignment, and show how this can be used to construct an optimal accuracy alignment algorithm which generates significantly more accurate alignments than standard dynamic programming methods in simulated experiments.
就其生成的比对的准确性而言,用于生成生物序列比对的算法存在固有的统计局限性。通过模拟,我们测量了标准全局动态规划方法的准确性,并表明它可以通过一种“边缘游走”近似来合理地建模,该近似用于描述间隙附近围绕正确路径的最优计分路径的分布。我们还给出了一个表格,通过它可以预测常用计分方案和序列差异(PAM和BLOSUM系列)的准确性值。最后,我们描述了如何计算给定比对的预期准确性,并展示了如何利用这一点构建一种最优准确性比对算法,在模拟实验中,该算法生成的比对比标准动态规划方法准确得多。