Wu Baolin, Pankow James S
Division of Biostatistics, University of Minnesota.
Division of Epidemiology and Community Health School of Public Health, University of Minnesota.
Stat Interface. 2017;10(3):379-386. doi: 10.4310/SII.2017.v10.n3.a2.
More and more large cohort studies have conducted or are conducting genome-wide association studies (GWAS) to reveal the genetic components of many complex human diseases. These large cohort studies often collected a broad array of correlated phenotypes that reflect common physiological processes. By jointly analyzing these correlated traits, we can gain more power by aggregating multiple weak effects and shed light on the mechanisms underlying complex human diseases. The majority of existing multi-trait association test methods are based on jointly modeling the multivariate traits conditional on the genotype as covariate, and can readily accommodate the imputed SNPs by using their imputed dosage as a covariate. An alternative class of multi-trait association tests is based on the inverted regression, which models the distribution of genotypes conditional on the covariate and multivariate traits, and has been shown to have competitive performance. To our knowledge, all existing inverted regression approaches have implicitly used the "best-guess" genotypes, which is not efficient and known to lead to dramatic power loss, and there have not been any proposed methods of incorporating imputation uncertainty into inverted regressions. In this work, we propose a general and efficient framework that can account for the imputation uncertainty to further improve the association test power of inverted regression models for imputed SNPs. We demonstrate through extensive numerical studies that the proposed method has competitive performance. We further illustrate its usefulness by application to association test of diabetes-related glycemic traits in the Atherosclerosis Risk in Communities (ARIC) Study.
越来越多的大型队列研究已经开展或正在开展全基因组关联研究(GWAS),以揭示许多复杂人类疾病的遗传成分。这些大型队列研究通常收集了反映常见生理过程的广泛相关表型。通过联合分析这些相关性状,我们可以通过聚合多个微弱效应获得更强的检验效能,并深入了解复杂人类疾病的潜在机制。现有的大多数多性状关联检验方法基于以基因型作为协变量对多变量性状进行联合建模,并且可以通过将插补单核苷酸多态性(SNP)的插补剂量用作协变量来轻松纳入插补的SNP。另一类多性状关联检验基于逆回归,它对以协变量和多变量性状为条件的基因型分布进行建模,并且已被证明具有竞争力。据我们所知,所有现有的逆回归方法都隐含地使用了“最佳猜测”基因型,这效率不高且已知会导致检验效能大幅损失,并且尚未有任何将插补不确定性纳入逆回归的方法被提出。在这项工作中,我们提出了一个通用且有效的框架,该框架可以考虑插补不确定性,以进一步提高针对插补SNP的逆回归模型的关联检验效能。我们通过广泛的数值研究证明,所提出的方法具有竞争力。我们通过将其应用于社区动脉粥样硬化风险(ARIC)研究中与糖尿病相关的血糖性状的关联检验,进一步说明了其有用性。