Kalafus Ken J, Jackson Andrew R, Milosavljevic Aleksandar
Program in Structural and Computational Biology and Molecular Biophysics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, 77030, USA.
Genome Res. 2004 Apr;14(4):672-8. doi: 10.1101/gr.1963804.
Pash is a computer program for efficient, parallel, all-against-all comparison of very long DNA sequences. Pash implements Positional Hashing, a novel parallelizable method for sequence comparison based on k-mer representation of sequences. The Positional Hashing method breaks the comparison problem in a unique way that avoids the quadratic penalty encountered with other sensitive methods and confers inherent low-level parallelism. Furthermore, Positional Hashing allows one to readily and predictably trade between sensitivity and speed. In a simulated comparison task, anchoring computationally mutated reads onto a genome, the sensitivity of Pash was equal to or greater than that of BLAST and BLAT, with Pash outperforming these programs as the reads became shorter and less similar to the genome. Using modest computing resources, we employed Pash for two large-scale sequence comparison tasks: comparison of three mammalian genomes, and anchoring millions of chimpanzee whole-genome shotgun sequencing reads onto the human genome. The results of these comparisons by Pash agree with those computed by other methods that use more than an order of magnitude more computing resources. These results confirm the sensitivity of Positional Hashing.
Pash是一个计算机程序,用于对非常长的DNA序列进行高效、并行的全对全比较。Pash实现了位置哈希算法,这是一种基于序列的k-mer表示的新型可并行化序列比较方法。位置哈希方法以独特的方式解决了比较问题,避免了其他敏感方法所遇到的二次惩罚,并赋予了固有的低级并行性。此外,位置哈希允许人们在灵敏度和速度之间轻松且可预测地进行权衡。在一项模拟比较任务中,即将计算突变的读段锚定到基因组上,Pash的灵敏度等于或高于BLAST和BLAT,随着读段变得更短且与基因组的相似度更低,Pash的表现优于这些程序。使用适度的计算资源,我们将Pash用于两项大规模序列比较任务:比较三个哺乳动物基因组,以及将数百万条黑猩猩全基因组鸟枪法测序读段锚定到人类基因组上。Pash进行这些比较的结果与使用超过一个数量级计算资源的其他方法所计算的结果一致。这些结果证实了位置哈希算法的灵敏度。