Peloso Gina M, Rader Daniel J, Gabriel Stacey, Kathiresan Sekar, Daly Mark J, Neale Benjamin M
Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Eur J Hum Genet. 2016 Jun;24(6):924-30. doi: 10.1038/ejhg.2015.197. Epub 2015 Sep 9.
Currently, next-generation sequencing studies aim to identify rare and low-frequency variation that may contribute to disease. For a given effect size, as the allele frequency decreases, the power to detect genes or variants of interest also decreases. Although many methods have been proposed for the analysis of such data, study design and analytic issues still persist in data interpretation. In this study we present sequencing data for ABCA1 that has known rare variants associated with high-density lipoprotein cholesterol (HDL-C). We contrast empirical findings from two study designs: a phenotypic extreme sample and a population-based random sample. We found differing strengths of association with HDL-C across the two study designs (P=0.0006 with n=701 phenotypic extremes vs P=0.03 with n=1600 randomly sampled individuals). To explore this apparent difference in evidence for association, we performed a simulation study focused on the impact of phenotypic selection on power. We demonstrate that the power gain for an extreme phenotypic selection study design is much greater in rare variant studies than for studies of common variants. Our study confirms that studying phenotypic extremes is critical in rare variant studies because it boosts power in two ways: the typical increases from extreme sampling and increasing the proportion of relevant functional variants ascertained and thereby tested for association. Furthermore, we show that when combining statistical evidence through meta-analysis from an extreme-selected sample and a second separate population-based random sample, power is lower when a traditional sample size weighting is used compared with weighting by the noncentrality parameter.
目前,新一代测序研究旨在识别可能导致疾病的罕见和低频变异。对于给定的效应大小,随着等位基因频率降低,检测感兴趣基因或变异的能力也会下降。尽管已经提出了许多方法来分析此类数据,但在数据解释方面,研究设计和分析问题仍然存在。在本研究中,我们展示了ABCA1的测序数据,该基因具有与高密度脂蛋白胆固醇(HDL-C)相关的已知罕见变异。我们对比了两种研究设计的实证结果:表型极端样本和基于人群的随机样本。我们发现在这两种研究设计中与HDL-C的关联强度不同(n = 701个表型极端个体时P = 0.0006,n = 1600个随机抽样个体时P = 0.03)。为了探究这种关联证据的明显差异,我们进行了一项模拟研究,重点关注表型选择对检验效能的影响。我们证明,在罕见变异研究中,极端表型选择研究设计的检验效能增益比常见变异研究大得多。我们的研究证实,在罕见变异研究中研究表型极端情况至关重要,因为它通过两种方式提高检验效能:极端抽样带来的典型增益以及增加已确定并因此进行关联测试的相关功能变异的比例。此外,我们表明,当通过荟萃分析将来自极端选择样本和第二个独立的基于人群的随机样本的统计证据结合起来时,与使用非中心参数加权相比,使用传统样本量加权时检验效能更低。