Zhang Kui, Sun Fengzhu
Department of Biostatistics, School of Public Health, University of Alabama, Birmingham, AL 35294, USA.
BMC Genet. 2005 Oct 19;6:51. doi: 10.1186/1471-2156-6-51.
Recent studies have indicated that the human genome could be divided into regions with low haplotype diversity interspersed with regions of high haplotype diversity. In regions of low haplotype diversity, a small fraction of SNPs (tag SNPs) are sufficient to account for most of the haplotype diversity of the human genome. These tag SNPs can be extremely useful for testing the association of a marker locus with a qualitative or quantitative trait locus in that it may not be necessary to genotype all the SNPs. When tag SNPs are used to reduce the genotyping effort in association studies, it is important to know how much power is lost. It is also important to know how much power is gained when tag SNPs instead of the same number of randomly chosen SNPs are used.
We design a simulation study to tackle these problems for a variety of quantitative association tests using either case-parent samples or unrelated population samples. First, the samples are generated based on the quantitative trait model with the assumption of either an extremal sampling scheme or a random sampling scheme. Second, a small number of samples are selected to determine the haplotype blocks and the tag SNPs. Third, the statistical power of the tests is evaluated using four kinds of data: (1) all the SNPs and the corresponding haplotypes, (2) the tag SNPs and the corresponding haplotypes, (3) the same number of evenly spaced SNPs with minor allele frequency greater than a threshold and the corresponding haplotypes, (4) the same number of randomly chosen SNPs and their corresponding haplotypes.
Our results suggest that in most situations genotyping efforts can be significantly reduced by using tag SNPs for mapping the QTL in association studies without much loss of power, which is consistent with previous studies on association mapping of qualitative traits. For all situations considered, two-locus haplotype analysis using tag SNPs are more powerful than those using the same number of randomly selected SNPs, but the degree of such power differences depends upon the sampling scheme and the population history.
近期研究表明,人类基因组可分为单倍型多样性低的区域和单倍型多样性高的区域相互穿插的区域。在单倍型多样性低的区域,一小部分单核苷酸多态性(标签单核苷酸多态性)就足以解释人类基因组的大部分单倍型多样性。这些标签单核苷酸多态性对于检测标记位点与定性或定量性状位点的关联极为有用,因为可能无需对所有单核苷酸多态性进行基因分型。当在关联研究中使用标签单核苷酸多态性来减少基因分型工作时,了解会损失多少效能很重要。同样重要的是,了解使用标签单核苷酸多态性而非相同数量的随机选择的单核苷酸多态性时会增加多少效能。
我们设计了一项模拟研究,以使用病例 - 父母样本或无关人群样本针对各种定量关联测试解决这些问题。首先,基于定量性状模型生成样本,假设采用极端抽样方案或随机抽样方案。其次,选择少量样本以确定单倍型块和标签单核苷酸多态性。第三,使用四种数据评估测试的统计效能:(1)所有单核苷酸多态性及其相应单倍型,(2)标签单核苷酸多态性及其相应单倍型,(3)相同数量的等位基因频率大于阈值的均匀间隔单核苷酸多态性及其相应单倍型,(4)相同数量的随机选择的单核苷酸多态性及其相应单倍型。
我们的结果表明,在大多数情况下,在关联研究中使用标签单核苷酸多态性来定位数量性状基因座时,基因分型工作可显著减少且不会损失太多效能,这与先前关于定性性状关联图谱的研究一致。对于所有考虑的情况,使用标签单核苷酸多态性的两位点单倍型分析比使用相同数量随机选择的单核苷酸多态性的分析更具效能,但这种效能差异的程度取决于抽样方案和群体历史。