Langaas Mette, Bakke Øyvind
Stat Appl Genet Mol Biol. 2014 Dec;13(6):675-92. doi: 10.1515/sagmb-2013-0084.
In genetic association studies, detecting disease-genotype association is a primary goal. We study seven robust test statistics for such association when the underlying genetic model is unknown, for data on disease status (case or control) and genotype (three genotypes of a biallelic genetic marker). In such studies, p-values have predominantly been calculated by asymptotic approximations or by simulated permutations. We consider an exact method, conditional enumeration. When the number of simulated permutations tends to infinity, the permutation p-value approaches the conditional enumeration p-value, but calculating the latter is much more efficient than performing simulated permutations. We have studied case-control sample sizes with 500-5000 cases and 500-15,000 controls, and significance levels from 5 × 10(-8) to 0.05, thus our results are applicable to genetic association studies with only a few genetic markers under study, intermediate follow-up studies, and genome-wide association studies. Our main findings are: (i) If all monotone genetic models are of interest, the best performance in the situations under study is achieved for the robust test statistics based on the maximum over a range of Cochran-Armitage trend tests with different scores and for the constrained likelihood ratio test. (ii) For significance levels below 0.05, for the test statistics under study, asymptotic approximations may give a test size up to 20 times the nominal level, and should therefore be used with caution. (iii) Calculating p-values based on exact conditional enumeration is a powerful, valid and computationally feasible approach, and we advocate its use in genetic association studies.
在基因关联研究中,检测疾病与基因型的关联是主要目标。当潜在遗传模型未知时,针对疾病状态(病例或对照)和基因型(双等位基因遗传标记的三种基因型)的数据,我们研究了七种用于此类关联的稳健检验统计量。在此类研究中,p值主要通过渐近近似法或模拟置换法计算。我们考虑一种精确方法,即条件枚举法。当模拟置换的次数趋于无穷大时,置换p值趋近于条件枚举p值,但计算后者比执行模拟置换要高效得多。我们研究了病例对照样本量,病例数为500 - 5000例,对照数为500 - 15000例,显著性水平从5×10⁻⁸到0.05,因此我们的结果适用于仅研究少数遗传标记的基因关联研究、中期随访研究以及全基因组关联研究。我们的主要发现如下:(i) 如果所有单调遗传模型都令人关注,那么在本研究的情况下,基于不同得分的一系列 Cochr an - Armitage趋势检验的最大值的稳健检验统计量以及约束似然比检验具有最佳性能。(ii) 对于低于0.05的显著性水平,对于所研究的检验统计量,渐近近似法可能给出高达名义水平20倍的检验规模,因此应谨慎使用。(iii) 基于精确条件枚举计算p值是一种强大、有效且计算可行的方法,我们提倡在基因关联研究中使用该方法。