Govindarajulu Usha S, Spiegelman Donna, Miller Katie L, Kraft Peter
Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA.
Genet Epidemiol. 2006 Nov;30(7):590-601. doi: 10.1002/gepi.20170.
Genotyping errors can induce biases in frequency estimates for haplotypes of single nucleotide polymorphisms (SNPs). Here, we considered the impact of SNP allele misclassification on haplotype odds ratio estimates from case-control studies of unrelated individuals.
We calculated bias analytically, using the haplotype counts expected in cases and controls under genotype misclassification. We evaluated the bias due to allele misclassification across a range of haplotype distributions using empirical haplotype frequencies within blocks of limited haplotype diversity. We also considered simple two- and three-locus haplotype distributions to understand the impact of haplotype frequency and number of SNPs on misclassification bias.
We found that for common haplotypes (>5% frequency), realistic genotyping error rates (0.1-1% chance of miscalling an allele), and moderate relative risks (2-4), the bias was always towards the null and increases in magnitude with increasing error rate, increasing odds ratio. For common haplotypes, bias generally increased with increasing haplotype frequency, while for rare haplotypes, bias generally increased with decreasing frequency. When the chance of miscalling an allele is 0.5%, the median bias in haplotype-specific odds ratios for common haplotypes was generally small (<4% on the log odds ratio scale), but the bias for some individual haplotypes was larger (10-20%). Bias towards the null leads to a loss in power; the relative efficiency using a test statistic based upon misclassified haplotype data compared to a test based on the unobserved true haplotypes ranged from roughly 60% to 80%, and worsened with increasing haplotype frequency.
The cumulative effect of small allele-calling errors across multiple loci can induce noticeable bias and reduce power in realistic scenarios. This has implications for the design of candidate gene association studies that utilize multi-marker haplotypes.
基因分型错误可导致单核苷酸多态性(SNP)单倍型频率估计出现偏差。在此,我们探讨了SNP等位基因错误分类对无关个体病例对照研究中单体型优势比估计的影响。
我们通过计算基因型错误分类情况下病例组和对照组中预期的单倍型计数来分析偏差。我们利用有限单倍型多样性区域内的经验单倍型频率,评估了一系列单倍型分布中等位基因错误分类导致的偏差。我们还考虑了简单的双位点和三位点单倍型分布,以了解单倍型频率和SNP数量对错误分类偏差的影响。
我们发现,对于常见单倍型(频率>5%)、实际的基因分型错误率(等位基因误判概率为0.1%-1%)以及中等相对风险(2-4),偏差始终趋向于无效值,且偏差幅度随错误率和优势比的增加而增大。对于常见单倍型,偏差通常随单倍型频率的增加而增加,而对于罕见单倍型,偏差通常随频率降低而增加。当等位基因误判概率为0.5%时,常见单倍型的单体型特异性优势比的中位数偏差通常较小(对数优势比尺度上<4%),但某些个体单倍型的偏差较大(10%-20%)。趋向于无效值的偏差导致效能降低;与基于未观察到的真实单倍型的检验相比,使用基于错误分类单倍型数据的检验统计量的相对效率约为60%至80%,且随单倍型频率增加而变差。
多个位点上小的等位基因判读错误的累积效应在实际情况下可导致明显的偏差并降低效能。这对利用多标记单倍型的候选基因关联研究设计具有启示意义。