Zhang Kui, Calabrese Peter, Nordborg Magnus, Sun Fengzhu
Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles 90089, USA.
Am J Hum Genet. 2002 Dec;71(6):1386-94. doi: 10.1086/344780. Epub 2002 Nov 18.
Recent studies have shown that the human genome has a haplotype block structure, such that it can be divided into discrete blocks of limited haplotype diversity. In each block, a small fraction of single-nucleotide polymorphisms (SNPs), referred to as "tag SNPs," can be used to distinguish a large fraction of the haplotypes. These tag SNPs can potentially be extremely useful for association studies, in that it may not be necessary to genotype all SNPs; however, this depends on how much power is lost. Here we develop a simulation study to quantitatively assess the power loss for a variety of study designs, including case-control designs and case-parental control designs. First, a number of data sets containing case-parental or case-control samples are generated on the basis of a disease model. Second, a small fraction of case and control individuals in each data set are genotyped at all the loci, and a dynamic programming algorithm is used to determine the haplotype blocks and the tag SNPs based on the genotypes of the sampled individuals. Third, the statistical power of tests was evaluated on the basis of three kinds of data: (1) all of the SNPs and the corresponding haplotypes, (2) the tag SNPs and the corresponding haplotypes, and (3) the same number of randomly chosen SNPs as the number of tag SNPs and the corresponding haplotypes. We study the power of different association tests with a variety of disease models and block-partitioning criteria. Our study indicates that the genotyping efforts can be significantly reduced by the tag SNPs, without much loss of power. Depending on the specific haplotype block-partitioning algorithm and the disease model, when the identified tag SNPs are only 25% of all the SNPs, the power is reduced by only 4%, on average, compared with a power loss of approximately 12% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. When the identified tag SNPs are approximately 14% of all the SNPs, the power is reduced by approximately 9%, compared with a power loss of approximately 21% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. Our study also indicates that haplotype-based analysis can be much more powerful than marker-by-marker analysis.
近期研究表明,人类基因组具有单倍型块结构,因此可被划分为单倍型多样性有限的离散块。在每个块中,一小部分单核苷酸多态性(SNP),即所谓的“标签SNP”,可用于区分大部分单倍型。这些标签SNP对于关联研究可能极其有用,因为可能无需对所有SNP进行基因分型;然而这取决于会损失多少检验效能。在此我们开展一项模拟研究,以定量评估各种研究设计(包括病例对照设计和病例 - 亲代对照设计)的效能损失。首先,基于疾病模型生成多个包含病例 - 亲代或病例对照样本的数据集。其次,对每个数据集的一小部分病例个体和对照个体进行所有位点的基因分型,并使用动态规划算法根据抽样个体的基因型确定单倍型块以及标签SNP