Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, North Carolina
Genetics. 2020 May;215(1):41-58. doi: 10.1534/genetics.119.302940. Epub 2020 Mar 4.
Age-at-onset is one of the critical traits in cohort studies of age-related diseases. Large-scale genome-wide association studies (GWAS) of age-at-onset traits can provide more insights into genetic effects on disease progression and transitions between stages. Moreover, proportional hazards (or Cox) regression models can achieve higher statistical power in a cohort study than a case-control trait using logistic regression. Although mixed-effects models are widely used in GWAS to correct for sample dependence, application of Cox mixed-effects models (CMEMs) to large-scale GWAS is so far hindered by intractable computational cost. In this work, we propose COXMEG, an efficient R package for conducting GWAS of age-at-onset traits using CMEMs. COXMEG introduces fast estimation algorithms for general sparse relatedness matrices including, but not limited to, block-diagonal pedigree-based matrices. COXMEG also introduces a fast and powerful score test for dense relatedness matrices, accounting for both population stratification and family structure. In addition, COXMEG generalizes existing algorithms to support positive semidefinite relatedness matrices, which are common in twin and family studies. Our simulation studies suggest that COXMEG, depending on the structure of the relatedness matrix, is orders of magnitude computationally more efficient than coxme and coxph with frailty for GWAS. We found that using sparse approximation of relatedness matrices yielded highly comparable results in controlling false-positive rate and retaining statistical power for an ethnically homogeneous family-based sample. By applying COXMEG to a study of Alzheimer's disease (AD) with a Late-Onset Alzheimer's Disease Family Study from the National Institute on Aging sample comprising 3456 non-Hispanic whites and 287 African Americans, we identified the ε variant with strong statistical power ( = 1e-101), far more significant than that reported in a previous study using a transformed variable and a marginal Cox model. Furthermore, we identified novel SNP rs36051450 ( = 2e-9) near , the minor allele of which significantly reduced the hazards of AD in both genders. These results demonstrated that COXMEG greatly facilitates the application of CMEMs in GWAS of age-at-onset traits.
发病年龄是年龄相关疾病队列研究中的关键特征之一。大规模全基因组关联研究(GWAS)对发病年龄特征的研究可以更深入地了解遗传对疾病进展和阶段之间转变的影响。此外,比例风险(或 Cox)回归模型在队列研究中比使用 logistic 回归的病例对照特征具有更高的统计功效。尽管混合效应模型在 GWAS 中被广泛用于纠正样本相关性,但 Cox 混合效应模型(CMEM)在大规模 GWAS 中的应用至今受到难以处理的计算成本的阻碍。在这项工作中,我们提出了 COXMEG,这是一个用于使用 CMEM 进行发病年龄特征 GWAS 的高效 R 包。COXMEG 为一般稀疏相关矩阵引入了快速估计算法,包括但不限于基于块对角的谱系矩阵。COXMEG 还为密集相关矩阵引入了快速而强大的得分检验,同时考虑了群体分层和家族结构。此外,COXMEG 将现有算法推广到支持正半定相关矩阵,正半定相关矩阵在双胞胎和家族研究中很常见。我们的模拟研究表明,取决于相关矩阵的结构,COXMEG 在计算效率上比 coxme 和 coxph 高几个数量级,尤其是在具有脆弱性的 GWAS 中。我们发现,使用相关矩阵的稀疏逼近可以在控制假阳性率和保留同一种族基于家庭样本的统计功效方面产生高度可比的结果。通过将 COXMEG 应用于国家老龄化研究所的一个包含 3456 名非西班牙裔白人和 287 名非裔美国人的迟发性阿尔茨海默病家族研究的阿尔茨海默病(AD)研究,我们确定了具有强统计功效的 ε 变体( = 1e-101),比之前使用转换变量和边际 Cox 模型的研究报告的结果更为显著。此外,我们在附近鉴定了 novel SNP rs36051450( = 2e-9),其次要等位基因显著降低了两性 AD 的发病风险。这些结果表明,COXMEG 极大地促进了 CMEM 在发病年龄特征 GWAS 中的应用。