Ji Fei, Finch Stephen J, Haynes Chad, Mendell Nancy R, Gordon Derek
Lab of Statistical Genetics, Rockefeller University, New York, NY, USA.
BMC Genomics. 2007 Jul 16;8:238. doi: 10.1186/1471-2164-8-238.
Studies of association methods using DNA pooling of single nucleotide polymorphisms (SNPs) have focused primarily on the effects of "machine-error", number of replicates, and the size of the pool. We use the non-centrality parameter (NCP) for the analysis of variance test to compute the approximate power for genetic association tests with DNA pooling data on cases and controls. We incorporate genetic model parameters into the computation of the NCP. Parameters involved in the power calculation are disease allele frequency, frequency of the marker SNP allele in coupling with the disease locus, disease prevalence, genotype relative risk, sample size, genetic model, number of pools, number of replicates of each pool, and the proportion of variance of the pooled frequency estimate due to machine variability. We compute power for different settings of number of replicates and total number of genotypings when the genetic model parameters are fixed. Several significance levels are considered, including stringent significance levels (due to the increasing popularity of 100 K and 500 K SNP "chip" data). We use a factorial design with two to four settings of each parameter and multiple regression analysis to assess which parameters most significantly affect power.
The power can increase substantially as the genotyping number increases. For a fixed number of genotypings, the power is a function of the number of replicates of each pool such that there is a setting with maximum power. The four most significant parameters affecting power for association are: (1) genotype relative risk, (2) genetic model, (3) sample size, and (4) the interaction term between disease and SNP marker allele probabilities.
For a fixed number of genotypings, there is an optimal number of replicates of each pool that increases as the number of genotypings increases. Power is not substantially reduced when the number of replicates is close to but not equal to the optimal setting.
使用单核苷酸多态性(SNP)DNA池的关联方法研究主要集中在“机器误差”、重复次数和池大小的影响上。我们使用方差分析检验的非中心参数(NCP)来计算病例和对照的DNA池数据的遗传关联检验的近似效能。我们将遗传模型参数纳入NCP的计算中。效能计算中涉及的参数包括疾病等位基因频率、与疾病位点连锁的标记SNP等位基因频率、疾病患病率、基因型相对风险、样本量、遗传模型、池的数量、每个池的重复次数以及由于机器变异性导致的池频率估计方差的比例。当遗传模型参数固定时,我们计算不同重复次数和基因分型总数设置下的效能。考虑了几个显著性水平,包括严格的显著性水平(由于100K和500K SNP“芯片”数据越来越受欢迎)。我们使用每个参数有两到四个设置的析因设计和多元回归分析来评估哪些参数对效能影响最显著。
随着基因分型数量的增加,效能可大幅提高。对于固定的基因分型数量,效能是每个池重复次数的函数,因此存在一个效能最大的设置。影响关联效能的四个最显著参数是:(1)基因型相对风险,(2)遗传模型,(3)样本量,以及(4)疾病和SNP标记等位基因概率之间的交互项。
对于固定的基因分型数量,每个池存在一个最优的重复次数,该次数随着基因分型数量的增加而增加。当重复次数接近但不等于最优设置时,效能不会大幅降低。