Suppr超能文献

高维预测模型中同时进行 SNP 选择和群体结构调整。

Simultaneous SNP selection and adjustment for population structure in high dimensional prediction models.

机构信息

Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, Québec, Canada.

Department of Diagnostic Radiology, McGill University, Montréal, Québec, Canada.

出版信息

PLoS Genet. 2020 May 4;16(5):e1008766. doi: 10.1371/journal.pgen.1008766. eCollection 2020 May.

Abstract

Complex traits are known to be influenced by a combination of environmental factors and rare and common genetic variants. However, detection of such multivariate associations can be compromised by low statistical power and confounding by population structure. Linear mixed effects models (LMM) can account for correlations due to relatedness but have not been applicable in high-dimensional (HD) settings where the number of fixed effect predictors greatly exceeds the number of samples. False positives or false negatives can result from two-stage approaches, where the residuals estimated from a null model adjusted for the subjects' relationship structure are subsequently used as the response in a standard penalized regression model. To overcome these challenges, we develop a general penalized LMM with a single random effect called ggmix for simultaneous SNP selection and adjustment for population structure in high dimensional prediction models. We develop a blockwise coordinate descent algorithm with automatic tuning parameter selection which is highly scalable, computationally efficient and has theoretical guarantees of convergence. Through simulations and three real data examples, we show that ggmix leads to more parsimonious models compared to the two-stage approach or principal component adjustment with better prediction accuracy. Our method performs well even in the presence of highly correlated markers, and when the causal SNPs are included in the kinship matrix. ggmix can be used to construct polygenic risk scores and select instrumental variables in Mendelian randomization studies. Our algorithms are available in an R package available on CRAN (https://cran.r-project.org/package=ggmix).

摘要

复杂性状已知受到环境因素和罕见及常见遗传变异的综合影响。然而,由于统计功效低和群体结构混杂,这些多变量关联的检测可能受到影响。线性混合效应模型 (LMM) 可以解释由于亲缘关系引起的相关性,但在高维 (HD) 环境中并不适用,因为固定效应预测因子的数量大大超过样本数量。两阶段方法可能会导致假阳性或假阴性,其中从调整主体关系结构的零模型估计的残差随后用作标准惩罚回归模型中的响应。为了克服这些挑战,我们开发了一种具有单个随机效应的广义惩罚 LMM,称为 ggmix,用于在高维预测模型中同时进行 SNP 选择和群体结构调整。我们开发了一种带有自动调谐参数选择的分块坐标下降算法,该算法具有高度可扩展性、计算效率高,并且具有收敛的理论保证。通过模拟和三个真实数据示例,我们表明与两阶段方法或主成分调整相比,ggmix 导致更简约的模型,具有更好的预测准确性。即使在存在高度相关标记物的情况下,以及当因果 SNP 包含在亲缘关系矩阵中时,我们的方法也能很好地发挥作用。ggmix 可用于构建多基因风险评分并选择孟德尔随机化研究中的工具变量。我们的算法可在 CRAN 上的 R 包中使用(https://cran.r-project.org/package=ggmix)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/89bf/7224575/1799ec5f7c11/pgen.1008766.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验