基因病例/对照关联研究中基因分型错误率的无呼叫减少与样本量损失之间的权衡。

Tradeoff between no-call reduction in genotyping error rate and loss of sample size for genetic case/control association studies.

作者信息

Kang S J, Gordon D, Brown A M, Ott J, Finch S J

机构信息

Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11794, USA.

出版信息

Pac Symp Biocomput. 2004:116-27. doi: 10.1142/9789812704856_0012.

DOI:10.1142/9789812704856_0012

PMID:14992497

Abstract

Single nucleotide polymorphisms (SNP) may be genotyped for use in case-control designs to test for association between a SNP marker and a disease using a 2 x 3 chi-squared test of independence. Genotyping is often based on underlying continuous measurements, which are classified into genotypes. A "no-call" procedure is sometimes used in which borderline observations are not classified. This procedure has the simultaneous effect of reducing the genotype error rate and the expected number of genotypes observed. Both quantities affect the power of the statistic. We develop methods for calculating the genotype error rate, the expected number of genotypes observed, and the expected power of the resulting test as a function of the no-call procedure. We examine the statistical properties of the chi-squared test using a no-call procedure when the underlying continuous measure of genotype classification is a three-component mixture of univariate normal distributions under a range of parameter specifications. The genotype error rate decreases as the no-call region is increased. The expected number of observations genotyped also decreases. Our key finding is that the expected power of the chi-squared test is not sensitive to the no-call procedure. That is, the benefits of reduced genotype error rate are almost exactly balanced by the losses due to reduced genotype observations. For an underlying univariate normal mixture of genotype classification to be analyzed with a 2 x 3 chi-squared test, there is little, if any, increase in power using a no-call procedure.

摘要

单核苷酸多态性（SNP）可进行基因分型，用于病例对照设计，通过2×3卡方独立性检验来检测SNP标记与疾病之间的关联性。基因分型通常基于潜在的连续测量值，这些测量值被分类为基因型。有时会使用“无调用”程序，即不将临界观察值进行分类。该程序具有同时降低基因型错误率和观察到的基因型预期数量的效果。这两个量都会影响统计量的功效。我们开发了一些方法，用于计算基因型错误率、观察到的基因型预期数量以及作为无调用程序函数的所得检验的预期功效。当在一系列参数规格下，基因型分类的潜在连续测量是单变量正态分布的三成分混合时，我们使用无调用程序研究卡方检验的统计特性。随着无调用区域的增加，基因型错误率会降低。进行基因分型的观察值预期数量也会减少。我们的关键发现是，卡方检验的预期功效对无调用程序不敏感。也就是说，基因型错误率降低带来的益处几乎完全被基因型观察值减少导致的损失所平衡。对于要使用2×3卡方检验分析的潜在单变量正态基因型分类混合，使用无调用程序几乎不会增加功效（如果有增加的话也非常小）。