Moskvina Valentina, Craddock Nick, Holmans Peter, Owen Michael J, O'Donovan Michael C
Bioinformatics and Biostatistics Unit, School of Medicine, Wales College of Medicine, Cardiff University, Cardiff, UK.
Hum Hered. 2006;61(1):55-64. doi: 10.1159/000092553. Epub 2006 Apr 6.
It is well known that genotyping error adversely affects the power of genetic case-control association studies but there is little research on its effects on type I error, and none that has addressed possible differences in genotype error rates between cases and controls.
We used simulations to examine the influence of genotyping error on the type I error probability given by case-control studies. The effect of genotyping error on the magnitude of type I error was explored for a single marker of varying minor allele frequency (MAF), and for haplotypic tests based on two markers with varying MAF and linkage disequilibrium (LD) measure r(2).
We show that even with low genotyping error rates (<0.01), systematic differences in the error rate between samples can result in type I error rates substantially above 0.05. The effect was maximal for markers with small MAF, markers in strong LD, and where a common allele is more frequently misclassified as a rare allele than vice versa. The problem was also exacerbated by the use of large samples.
Our results show that small differential genotyping error rates between cases and controls pose significant problems for association analyses. Differential genotyping error rates are particularly likely to arise where genotype data are combined from multiple sites, or where case genotypes are examined against archived reference population cohort genotypes that are being generated in several countries. Although these strategies may be necessary to obtain adequately powered samples, our data show the importance of stringent quality control. Furthermore, associations based on rare haplotypes should be treated with caution.
众所周知,基因分型错误会对基因病例对照关联研究的效能产生不利影响,但关于其对I型错误的影响的研究很少,且没有研究探讨病例组和对照组之间基因分型错误率可能存在的差异。
我们使用模拟方法来检验基因分型错误对病例对照研究给出的I型错误概率的影响。针对不同次要等位基因频率(MAF)的单个标记,以及基于两个具有不同MAF和连锁不平衡(LD)度量r²的标记的单倍型检验,探讨了基因分型错误对I型错误大小的影响。
我们发现,即使基因分型错误率较低(<0.01),样本间错误率的系统差异也会导致I型错误率大幅高于0.05。对于MAF较小的标记、处于强LD状态的标记,以及常见等位基因被误分类为罕见等位基因的频率高于反之情况的标记,这种影响最大。使用大样本也会加剧这个问题。
我们的结果表明,病例组和对照组之间微小的基因分型错误率差异会给关联分析带来重大问题。当从多个位点合并基因型数据,或者将病例基因型与在多个国家生成的存档参考人群队列基因型进行比对时,尤其容易出现基因分型错误率差异。尽管这些策略对于获得足够效能的样本可能是必要的,但我们的数据表明了严格质量控制的重要性。此外,基于罕见单倍型的关联应谨慎对待。