Choi Jeong-Hyeon, Cho Hwan-Gue, Kim Sun
School of Informatics, Indiana University, Bloomington, IN 47408, USA.
Comput Biol Chem. 2005 Jun;29(3):244-53. doi: 10.1016/j.compbiolchem.2005.04.004.
In this paper, we present a simple and efficient whole genome alignment method using maximal exact match (MEM). The major problem with the use of MEM anchor is that the number of hits in non-homologous regions increases exponentially when shorter MEM anchors are used to detect more homologous regions. To deal with this problem, we have developed a fast and accurate anchor filtering scheme based on simple match extension with minimum percent identity and extension length criteria. Due to its simplicity and accuracy, all MEM anchors in a pair of genomes can be exhaustively tested and filtered. In addition, by incorporating the translation technique, the alignment quality and speed of our genome alignment algorithm have been further improved. As a result, our genome alignment algorithm, GAME (Genome Alignment by Match Extension), performs competitively over existing algorithms and can align large whole genomes, e.g., A. thaliana, without the requirement of typical large memory and parallel processors. This is shown using an experiment which compares the performance of BLAST, BLASTZ, PatternHunter, MUMmer and our algorithm in aligning all 45 pairs of 10 microbial genomes. The scalability of our algorithm is shown in another experiment where all pairs of five chromosomes in A. thaliana were compared.
在本文中,我们提出了一种使用最大精确匹配(MEM)的简单高效的全基因组比对方法。使用MEM锚点的主要问题在于,当使用更短的MEM锚点来检测更多同源区域时,非同源区域中的命中数会呈指数增长。为了解决这个问题,我们基于具有最小百分比一致性和延伸长度标准的简单匹配延伸,开发了一种快速且准确的锚点过滤方案。由于其简单性和准确性,可以对一对基因组中的所有MEM锚点进行详尽的测试和过滤。此外,通过纳入翻译技术,我们的基因组比对算法的比对质量和速度得到了进一步提高。结果,我们的基因组比对算法GAME(通过匹配延伸进行基因组比对)在与现有算法的竞争中表现出色,并且能够比对大型全基因组,例如拟南芥基因组,而无需典型的大内存和并行处理器。这通过一个实验得到了证明,该实验比较了BLAST、BLASTZ、PatternHunter、MUMmer和我们的算法在比对10个微生物基因组的所有45对基因组时的性能。我们算法的可扩展性在另一个实验中得到了展示,该实验比较了拟南芥五条染色体的所有基因组对。