Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455-0392, USA.
Genet Epidemiol. 2011 Nov;35(7):606-19. doi: 10.1002/gepi.20609. Epub 2011 Jul 18.
In anticipation of the availability of next-generation sequencing data, there is increasing interest in investigating association between complex traits and rare variants (RVs). In contrast to association studies for common variants (CVs), due to the low frequencies of RVs, common wisdom suggests that existing statistical tests for CVs might not work, motivating the recent development of several new tests for analyzing RVs, most of which are based on the idea of pooling/collapsing RVs. However, there is a lack of evaluations of, and thus guidance on the use of, existing tests. Here we provide a comprehensive comparison of various statistical tests using simulated data. We consider both independent and correlated rare mutations, and representative tests for both CVs and RVs. As expected, if there are no or few non-causal (i.e. neutral or non-associated) RVs in a locus of interest while the effects of causal RVs on the trait are all (or mostly) in the same direction (i.e. either protective or deleterious, but not both), then the simple pooled association tests (without selecting RVs and their association directions) and a new test called kernel-based adaptive clustering (KBAC) perform similarly and are most powerful; KBAC is more robust than simple pooled association tests in the presence of non-causal RVs; however, as the number of non-causal CVs increases and/or in the presence of opposite association directions, the winners are two methods originally proposed for CVs and a new test called C-alpha test proposed for RVs, each of which can be regarded as testing on a variance component in a random-effects model. Interestingly, several methods based on sequential model selection (i.e. selecting causal RVs and their association directions), including two new methods proposed here, perform robustly and often have statistical power between those of the above two classes.
在新一代测序数据即将面世的情况下,人们对研究复杂性状与罕见变异(RV)之间的关联越来越感兴趣。与常见变异(CV)的关联研究不同,由于 RV 的频率较低,因此普遍认为 CV 现有的统计检验方法可能无法奏效,这促使人们最近开发了几种用于分析 RV 的新检验方法,其中大多数方法基于对 RV 进行合并/折叠的思想。然而,目前缺乏对现有检验方法的评估和使用指导。在这里,我们使用模拟数据对各种统计检验方法进行了全面比较。我们同时考虑了独立和相关的罕见突变,以及 CV 和 RV 的代表性检验方法。正如预期的那样,如果在感兴趣的基因座中没有或只有少数非因果(即中性或非关联)RV,而因果 RV 对性状的影响都在同一方向(即保护性或有害性,但不是两者兼而有之),那么简单的合并关联检验(无需选择 RV 及其关联方向)和一种称为基于核的自适应聚类(KBAC)的新检验方法表现相似,且最为有效;在存在非因果 RV 的情况下,KBAC 比简单的合并关联检验更为稳健;然而,随着非因果 CV 的数量增加和/或存在相反的关联方向,获胜者是两种最初为 CV 提出的方法和一种新的为 RV 提出的称为 C-alpha 检验的方法,它们都可以被视为在随机效应模型中的方差分量上进行检验。有趣的是,几种基于序贯模型选择(即选择因果 RV 及其关联方向)的方法,包括本文提出的两种新方法,表现稳健,其统计功效通常介于上述两类方法之间。