He Zihuai, Xu Bin, Lee Seunggeun, Ionita-Laza Iuliana
Department of Biostatistics, Columbia University, New York, NY 10032, USA.
Department of Psychiatry, Columbia University, New York, NY 10032, USA.
Am J Hum Genet. 2017 Sep 7;101(3):340-352. doi: 10.1016/j.ajhg.2017.07.011. Epub 2017 Aug 24.
Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests.
人类基因组遗传变异的功能注释已取得重大进展。将此类功能注释纳入测序研究的综合分析有助于发现与疾病相关的遗传变异,尤其是那些功能未知且位于蛋白质编码区域之外的变异。当功能注释不能预测变异的风险状态时,直接将一种功能注释作为权重纳入现有的离散度和负担检验中,可能会导致功效大幅损失。在此,我们开发了统一检验方法,可利用多种功能注释,通过高效计算技术同时进行综合关联分析。我们表明,当功能注释能够预测变异风险状态时,所提出的检验方法能显著提高功效。重要的是,当功能注释不能预测风险状态时,与现有的离散度和负担检验相比,所提出的检验方法仅导致极小的功效损失,并且在某些情况下,它们甚至可以通过以数据自适应方式学习更接近潜在疾病模型的权重来提高功效。这些检验可以根据测序数据的现有离散度和负担检验的汇总统计量构建,因此无需共享个体水平数据即可对多项研究进行荟萃分析。我们将所提出的检验方法应用于对来自八项研究的12281名个体的代谢芯片数据中的非编码罕见变异进行脂质性状的荟萃分析。通过纳入特征功能评分,我们检测到SLC22A3中的非编码罕见变异与低密度脂蛋白和总胆固醇之间存在显著关联,而标准离散度和负担检验未发现这些关联。