Huang Yao-Ting, Chao Kun-Mao
Department of Computer Science and Information Engineering, National Chung-Cheng University, Chia-Yi, Taiwan.
J Biomed Inform. 2008 Dec;41(6):953-61. doi: 10.1016/j.jbi.2008.04.003. Epub 2008 Apr 12.
This paper proposes a new framework for the selection of tag SNPs based on haplotypes instead of on a single SNP. The tag SNPs found by this framework form a set of haplotypes completely predictive of the alleles of all untyped SNPs. We refer to this problem as MTMH, which is defined as follows: given a set of SNPs, find a minimum subset of SNPs (called tag SNPs) which defines a set of haplotypes completely predictive of the alleles of all untyped SNPs. The MTMH problem is solved by dividing into three subproblems, two of which are shown to be NP-hard. Several exact and approximation algorithms are proposed to solve these subproblems. We describe a framework which integrates these algorithms and develop a program called HapTagger for finding tag SNPs. HapTagger is compared with existing methods as well as the official tagging tool (called Haploview) of the International HapMap project using a variety of real data sets. Our theoretical analysis and experimental results indicate that HapTagger consistently identifies a smaller set of tag SNPs and runs much faster than existing methods. HapTagger avoids the need of incorporating a linkage disequilibrium statistic and thus significantly improves the computational efficiency. We also present an algorithm (specific to HapTagger) for reconstructing alleles of untyped SNPs. It is worth mentioning that these predictive haplotypes selected by HapTagger can be used as signatures of recent positive selection or co-evolution. HapTagger is available at http://www.csie.ntu.edu.tw/~kmchao/tools/HapTagger/.
本文提出了一种基于单倍型而非单个单核苷酸多态性(SNP)来选择标签SNP的新框架。通过该框架找到的标签SNP形成了一组单倍型,可完全预测所有未分型SNP的等位基因。我们将此问题称为MTMH,其定义如下:给定一组SNP,找到一个最小的SNP子集(称为标签SNP),该子集定义了一组单倍型,可完全预测所有未分型SNP的等位基因。MTMH问题通过分为三个子问题来解决,其中两个子问题被证明是NP难的。提出了几种精确算法和近似算法来解决这些子问题。我们描述了一个整合这些算法的框架,并开发了一个名为HapTagger的程序来寻找标签SNP。使用各种真实数据集,将HapTagger与现有方法以及国际人类基因组单体型图计划的官方标签工具(称为Haploview)进行了比较。我们的理论分析和实验结果表明,HapTagger始终能识别出更小的标签SNP集,并且运行速度比现有方法快得多。HapTagger无需纳入连锁不平衡统计量,从而显著提高了计算效率。我们还提出了一种(特定于HapTagger的)算法,用于重建未分型SNP的等位基因。值得一提的是,HapTagger选择的这些预测性单倍型可作为近期正选择或共同进化的特征。可通过http://www.csie.ntu.edu.tw/~kmchao/tools/HapTagger/获取HapTagger。