Kang Sun Jung, Gordon Derek, Finch Stephen J
Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA.
Genet Epidemiol. 2004 Feb;26(2):132-41. doi: 10.1002/gepi.10301.
Which genotype misclassification errors are most costly, in terms of increased sample size necessary (SSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association? We answer this question for single-nucleotide polymorphisms (SNPs), using the 2x3 chi(2) test of independence. Our strategy is to expand the noncentrality parameter of the asymptotic distribution of the chi(2) test under a specified alternative hypothesis to approximate SSN, using a linear Taylor series in the error parameters. We consider two scenarios: the first assumes Hardy-Weinberg equilibrium (HWE) for the true genotypes in both cases and controls, and the second assumes HWE only in controls. The Taylor series approximation has a relative error of less than 1% when each error rate is less than 2%. The most costly error is recording the more common homozygote as the less common homozygote, with indefinitely increasing cost coefficient as minor SNP allele frequencies approach 0 in both scenarios. The cost of misclassifying the more common homozygote to the heterozygote also becomes indefinitely large as the minor SNP allele frequency goes to 0 under both scenarios. For the violation of HWE modeled here, the cost of misclassifying a heterozygote to the less common homozygote becomes large, although bounded. Therefore, the use of SNPs with a small minor allele frequency requires careful attention to the frequency of genotyping errors to ensure that power specifications are met. Furthermore, the design of automated genotyping should minimize those errors whose cost coefficients can become indefinitely large.
在进行基因关联的病例/对照研究时,就维持恒定的渐近检验效能和显著性水平所需增加的样本量(SSN)而言,哪种基因型误判错误代价最高?我们使用2×3卡方独立性检验,针对单核苷酸多态性(SNP)回答这个问题。我们的策略是,在特定的备择假设下,利用误差参数中的线性泰勒级数,扩展卡方检验渐近分布的非中心参数,以近似SSN。我们考虑两种情况:第一种情况假设病例组和对照组的真实基因型均处于哈迪-温伯格平衡(HWE),第二种情况假设仅对照组处于HWE。当每个误判率小于2%时,泰勒级数近似的相对误差小于1%。代价最高的错误是将较常见的纯合子记录为较不常见的纯合子,在两种情况下,随着次要SNP等位基因频率接近0,代价系数会无限增加。在两种情况下,当次要SNP等位基因频率趋于0时,将较常见的纯合子误判为杂合子的代价也会变得无限大。对于此处模拟的违反HWE的情况,将杂合子误判为较不常见纯合子的代价虽然有界,但会变大。因此,使用次要等位基因频率较小的SNP时,需要仔细关注基因分型错误的频率,以确保满足检验效能规范。此外,自动基因分型的设计应尽量减少那些代价系数可能变得无限大的错误。