De La Vega Francisco M, Gordon Derek, Su Xiaoping, Scafe Charles, Isaac Hadar, Gilbert Dennis A, Spier Eugene G
Applied Biosystems, Foster City, CA 94404, USA.
Hum Hered. 2005;60(1):43-60. doi: 10.1159/000087918. Epub 2005 Sep 2.
Power and sample size calculations are critical parts of any research design for genetic association. We present a method that utilizes haplotype frequency information and average marker-marker linkage disequilibrium on SNPs typed in and around all genes on a chromosome. The test statistic used is the classic likelihood ratio test applied to haplotypes in case/control populations. Haplotype frequencies are computed through specification of genetic model parameters. Power is determined by computation of the test's non-centrality parameter. Power per gene is computed as a weighted average of the power assuming each haplotype is associated with the trait. We apply our method to genotype data from dense SNP maps across three entire chromosomes (6, 21, and 22) for three different human populations (African-American, Caucasian, Chinese), three different models of disease (additive, dominant, and multiplicative) and two trait allele frequencies (rare, common). We perform a regression analysis using these factors, average marker-marker disequilibrium, and the haplotype diversity across the gene region to determine which factors most significantly affect average power for a gene in our data. Also, as a 'proof of principle' calculation, we perform power and sample size calculations for all genes within 100 kb of the PSORS1 locus (chromosome 6) for a previously published association study of psoriasis. Results of our regression analysis indicate that four highly significant factors that determine average power to detect association are: disease model, average marker-marker disequilibrium, haplotype diversity, and the trait allele frequency. These findings may have important implications for the design of well-powered candidate gene association studies. Our power and sample size calculations for the PSORS1 gene appear consistent with published findings, namely that there is substantial power (>0.99) for most genes within 100 kb of the PSORS1 locus at the 0.01 significance level.
效能和样本量计算是任何基因关联研究设计的关键部分。我们提出了一种方法,该方法利用了染色体上所有基因及其周围分型的单核苷酸多态性(SNP)的单倍型频率信息和平均标记-标记连锁不平衡。所使用的检验统计量是应用于病例/对照人群中单倍型的经典似然比检验。单倍型频率通过遗传模型参数的设定来计算。效能由检验的非中心参数计算得出。每个基因的效能通过假设每个单倍型与该性状相关时的效能加权平均值来计算。我们将我们的方法应用于来自三个不同人群(非裔美国人、白种人、中国人)的三条完整染色体(6号、21号和22号)上密集SNP图谱的基因型数据,三种不同的疾病模型(加性、显性和乘性)以及两个性状等位基因频率(罕见、常见)。我们使用这些因素、平均标记-标记不平衡以及基因区域内的单倍型多样性进行回归分析,以确定哪些因素对我们数据中一个基因的平均效能影响最为显著。此外,作为“原理验证”计算,我们对银屑病先前发表的关联研究中PSORS1基因座(6号染色体)100 kb范围内的所有基因进行了效能和样本量计算。我们回归分析的结果表明,决定检测关联平均效能的四个高度显著因素是:疾病模型、平均标记-标记不平衡、单倍型多样性和性状等位基因频率。这些发现可能对设计效能充足的候选基因关联研究具有重要意义。我们对PSORS1基因的效能和样本量计算似乎与已发表的结果一致,即在0.01显著性水平下,PSORS1基因座100 kb范围内的大多数基因具有很高的效能(>0.99)。