Brinza Dumitru, Zelikovsky Alexander
Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA.
Bioinformatics. 2006 Feb 1;22(3):371-3. doi: 10.1093/bioinformatics/bti785. Epub 2005 Nov 15.
2SNP software package implements a new very fast scalable algorithm for haplotype inference based on genotype statistics collected only for pairs of SNPs. This software can be used for comparatively accurate phasing of large number of long genome sequences, e.g. obtained from DNA arrays. As an input 2SNP takes genotype matrix and outputs the corresponding haplotype matrix. On datasets across 79 regions from HapMap 2SNP is several orders of magnitude faster than GERBIL and PHASE while matching them in quality measured by the number of correctly phased genotypes, single-site and switching errors. For example, 2SNP requires 41 s on Pentium 4 2 Ghz processor to phase 30 genotypes with 1381 SNPs (ENm010.7p15:2 data from HapMap) versus GERBIL and PHASE requiring more than a week and admitting no less errors than 2SNP.
2SNP软件包基于仅针对SNP对收集的基因型统计信息,实现了一种全新的、非常快速且可扩展的单倍型推断算法。该软件可用于对大量长基因组序列进行相对准确的定相,例如从DNA阵列获得的序列。2SNP以基因型矩阵作为输入,并输出相应的单倍型矩阵。在来自HapMap的79个区域的数据集上,2SNP比GERBIL和PHASE快几个数量级,同时在由正确定相的基因型数量、单位点和转换错误所衡量的质量方面与它们相当。例如,在奔腾4 2 Ghz处理器上,2SNP对30个具有1381个SNP的基因型进行定相(来自HapMap的ENm010.7p15:2数据)需要41秒,而GERBIL和PHASE需要超过一周的时间,并且产生的错误不比2SNP少。