Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada.
Département de Mathématiques, Université du Québec à Montréal, Montreal, QC, Canada.
Int J Biostat. 2022 Oct 24;19(2):369-387. doi: 10.1515/ijb-2022-0010. eCollection 2023 Nov 1.
In genome wide association studies (GWAS), researchers are often dealing with dichotomous and non-normally distributed traits, or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the [0, 1] interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible copula-based multivariate association test (CBMAT) for discovering association between a genetic region and a bivariate continuous, binary or mixed phenotype. We also derive a data-driven analytic -value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1477 subjects from the ASLPAC study.
在全基因组关联研究 (GWAS) 中,研究人员通常处理二分类和非正态分布的性状,或离散连续性状的混合。然而,大多数当前基于区域的方法依赖于多变量线性混合模型 (mvLMM),并假设感兴趣的表型呈多元正态分布。因此,这些方法不适用于疾病或非正态分布的性状。因此,需要开发统一和灵活的方法来研究一组(可能罕见的)遗传变异与非正态多元表型之间的关联。Copulas 是具有 [0,1] 区间均匀边缘的多元分布函数,它们提供了适合处理多元关联研究中误差非正态性的模型。我们提出了一种新颖的基于 Copula 的统一灵活的多元关联测试 (CBMAT),用于发现遗传区域与双变量连续、二分类或混合表型之间的关联。我们还推导出了所提出的基于区域的评分型检验的基于数据的分析 - 值过程。通过模拟研究,我们证明与其他现有方法相比,CBMAT 具有良好的控制 I 型错误率和更高的检测关联的能力,用于离散和非正态分布的性状。最后,我们将 CBMAT 应用于检测位于第 11 号染色体上的两个基因与来自 ASLPAC 研究的 1477 名受试者测量的几个脂质水平之间的关联。