David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada.
IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):698-709. doi: 10.1109/TCBB.2010.76.
We present a pairwise local aligner, FEAST, which uses two new techniques: a sensitive extension algorithm for identifying homologous subsequences, and a descriptive probabilistic alignment model. We also present a new procedure for training alignment parameters and apply it to the human and mouse genomes, producing a better parameter set for these sequences. Our extension algorithm identifies homologous subsequences by considering all evolutionary histories. It has higher maximum sensitivity than Viterbi extensions, and better balances specificity. We model alignments with several submodels, each with unique statistical properties, describing strongly similar and weakly similar regions of homologous DNA. Training parameters using two submodels produces superior alignments, even when we align with only the parameters from the weaker submodel. Our extension algorithm combined with our new parameter set achieves sensitivity 0.59 on synthetic tests. In contrast, LASTZ with default settings achieves sensitivity 0.35 with the same false positive rate. Using the weak submodel as parameters for LASTZ increases its sensitivity to 0.59 with high error. FEAST is available at http://monod.uwaterloo.ca/feast/.
我们提出了一种新的两两局部比对程序 FEAST,它使用了两种新技术:一种用于识别同源序列的敏感扩展算法和一种描述性概率比对模型。我们还提出了一种新的对齐参数训练程序,并将其应用于人类和小鼠基因组,为这些序列生成了更好的参数集。我们的扩展算法通过考虑所有进化历史来识别同源序列。它比维特比扩展具有更高的最大灵敏度,并且更好地平衡了特异性。我们使用多个子模型对比对进行建模,每个子模型都具有独特的统计特性,描述同源 DNA 的强相似区域和弱相似区域。使用两个子模型训练参数可产生更好的比对结果,即使我们仅使用较弱子模型的参数进行比对也是如此。我们的扩展算法结合新的参数集,在合成测试中实现了 0.59 的灵敏度。相比之下,LASTZ 默认设置的灵敏度为 0.35,假阳性率相同。使用弱子模型作为 LASTZ 的参数可以将其灵敏度提高到 0.59,但错误率很高。FEAST 可在 http://monod.uwaterloo.ca/feast/ 上获得。