当某些样本不明确时，区分总体的统计检验力丧失。

O'Hely Martin, Slatkin Montgomery

Department of Integrative Biology, University of California, 3060 Valley Life Sciences, Bldg. 3140 (4151-4155 VLSB), Berkeley, CA 94720-3140, USA.

Theor Popul Biol. 2003 Sep;64(2):177-92. doi: 10.1016/s0040-5809(03)00084-4.

Case-control studies are used to map loci associated with a genetic disease. The usual case-control study tests for significant differences in frequencies of alleles at marker loci. In this paper, we consider the problem of comparing two or more marker loci simultaneously and testing for significant differences in haplotype rather than allele frequencies. We consider two situations. In the first, genotypes at marker loci are resolved into haplotypes by making use of biochemical methods or by genotyping family members. In the second, genotypes at marker loci are not resolved into haplotypes, but, by assuming random mating, haplotypes can be inferred using a likelihood method such as the expectation-maximization (EM) algorithm. We assume that a causative locus has two alleles with a multiplicative effect on the penetrance of a disease, with one allele increasing the penetrance by a factor pi. We find, for small values of pi-1 and large sample sizes, asymptotic results that predict the statistical power of a test for significant differences in haplotype frequencies between cases and a random sample of the population, both when haplotypes can be resolved and when haplotypes have to be inferred. The increase in power when haplotypes can be resolved can be expressed as a ratio R, which is the increase in sample size needed to achieve the same power when haplotypes are resolved over when they are not resolved. In general, R depends on the pattern of linkage disequilibrium between the causative allele and the marker haplotypes but is independent of the frequency of the causative allele and, to a first approximation, is independent of pi. For the special situation of two di-allelic marker loci, we obtain a simple expression for R and its upper bound.

病例对照研究用于定位与遗传疾病相关的基因座。通常的病例对照研究测试标记基因座上等位基因频率的显著差异。在本文中，我们考虑同时比较两个或更多标记基因座并测试单倍型而非等位基因频率的显著差异的问题。我们考虑两种情况。第一种情况是，通过生化方法或对家庭成员进行基因分型，将标记基因座的基因型解析为单倍型。第二种情况是，标记基因座的基因型未解析为单倍型，但通过假设随机交配，可以使用诸如期望最大化（EM）算法等似然方法推断单倍型。我们假设一个致病基因座有两个等位基因，对疾病的外显率有相乘效应，其中一个等位基因使外显率增加一个因子pi。我们发现，对于pi - 1的小值和大样本量，当单倍型可以解析以及当单倍型必须推断时，渐近结果预测了病例与人群随机样本之间单倍型频率显著差异检验的统计功效。当单倍型可以解析时功效的增加可以表示为一个比率R，它是当单倍型解析时为达到与未解析时相同功效所需增加的样本量。一般来说，R取决于致病等位基因与标记单倍型之间的连锁不平衡模式，但与致病等位基因的频率无关，并且在一阶近似下与pi无关。对于两个双等位基因标记基因座的特殊情况，我们得到了R及其上限的简单表达式。