使用平滑群组 Lasso 在全基因组关联研究中纳入群组相关性。

Incorporating group correlations in genome-wide association studies using smoothed group Lasso.

机构信息

School of Public Health, Yale University, New Haven, CT 06520, USA.

出版信息

Biostatistics. 2013 Apr;14(2):205-19. doi: 10.1093/biostatistics/kxs034. Epub 2012 Sep 17.

DOI:10.1093/biostatistics/kxs034

PMID:22988281

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3590928/

Abstract

In genome-wide association studies, penalization is an important approach for identifying genetic markers associated with disease. Motivated by the fact that there exists natural grouping structure in single nucleotide polymorphisms and, more importantly, such groups are correlated, we propose a new penalization method for group variable selection which can properly accommodate the correlation between adjacent groups. This method is based on a combination of the group Lasso penalty and a quadratic penalty on the difference of regression coefficients of adjacent groups. The new method is referred to as smoothed group Lasso (SGL). It encourages group sparsity and smoothes regression coefficients for adjacent groups. Canonical correlations are applied to the weights between groups in the quadratic difference penalty. We first derive a GCD algorithm for computing the solution path with linear regression model. The SGL method is further extended to logistic regression for binary response. With the assistance of the majorize-minimization algorithm, the SGL penalized logistic regression turns out to be an iteratively penalized least-square problem. We also suggest conducting principal component analysis to reduce the dimensionality within groups. Simulation studies are used to evaluate the finite sample performance. Comparison with group Lasso shows that SGL is more effective in selecting true positives. Two datasets are analyzed using the SGL method.

摘要

在全基因组关联研究中，惩罚是识别与疾病相关的遗传标记的重要方法。受单核苷酸多态性中存在自然分组结构的事实的启发，更重要的是，这些组是相关的，我们提出了一种新的惩罚方法，用于组变量选择，该方法可以适当适应相邻组之间的相关性。该方法基于组 Lasso 惩罚和相邻组回归系数差的二次惩罚的组合。新方法称为平滑组 Lasso（SGL）。它鼓励组稀疏并平滑相邻组的回归系数。典型相关应用于二次差分惩罚中组间的权重。我们首先为线性回归模型推导了一种计算解路径的 GCD 算法。SGL 方法进一步扩展到二项响应的逻辑回归。借助于极大似然算法，SGL 惩罚逻辑回归变成了一个迭代惩罚最小二乘问题。我们还建议进行主成分分析以降低组内的维数。模拟研究用于评估有限样本性能。与组 Lasso 的比较表明，SGL 在选择真阳性方面更有效。使用 SGL 方法分析了两个数据集。

相似文献

Incorporating group correlations in genome-wide association studies using smoothed group Lasso.

Biostatistics. 2013 Apr;14(2):205-19. doi: 10.1093/biostatistics/kxs034. Epub 2012 Sep 17.

Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method.

Stat Interface. 2013 Jan 1;6(1):99-115. doi: 10.4310/SII.2013.v6.n1.a10.

Combining Sparse Group Lasso and Linear Mixed Model Improves Power to Detect Genetic Variants Underlying Quantitative Traits.

Front Genet. 2019 Apr 10;10:271. doi: 10.3389/fgene.2019.00271. eCollection 2019.

Analysis of genome-wide association studies with multiple outcomes using penalization.

PLoS One. 2012;7(12):e51198. doi: 10.1371/journal.pone.0051198. Epub 2012 Dec 14.

Genome-wide association studies for discrete traits.

Genet Epidemiol. 2009;33 Suppl 1(Suppl 1):S8-12. doi: 10.1002/gepi.20465.

Multistage analysis strategies for genome-wide association studies: summary of group 3 contributions to Genetic Analysis Workshop 16.

Genet Epidemiol. 2009;33 Suppl 1(Suppl 1):S19-23. doi: 10.1002/gepi.20467.

SNP selection in genome-wide and candidate gene studies via penalized logistic regression.

Genet Epidemiol. 2010 Dec;34(8):879-91. doi: 10.1002/gepi.20543.

A permutation approach for selecting the penalty parameter in penalized model selection.

Biometrics. 2015 Dec;71(4):1185-94. doi: 10.1111/biom.12359. Epub 2015 Aug 3.

A genome-wide association study of rheumatoid arthritis without antibodies against citrullinated peptides.

Ann Rheum Dis. 2015 Mar;74(3):e15. doi: 10.1136/annrheumdis-2013-204591. Epub 2014 Feb 14.

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data.

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad063.

引用本文的文献

A nomogram for predicting nutritional risk before gastric cancer surgery.

Asia Pac J Clin Nutr. 2024 Dec;33(4):529-538. doi: 10.6133/apjcn.202412_33(4).0007.

Development and performance assessment of novel machine learning models for predicting postoperative pneumonia in aneurysmal subarachnoid hemorrhage patients: external validation in MIMIC-IV.

Front Neurol. 2024 Apr 15;15:1341252. doi: 10.3389/fneur.2024.1341252. eCollection 2024.

Bayesian bi-level variable selection for genome-wide survival study.

Genomics Inform. 2023 Sep;21(3):e28. doi: 10.5808/gi.23047. Epub 2023 Jun 28.

Development and external validation of a nomogram for predicting postoperative pneumonia in aneurysmal subarachnoid hemorrhage.

Front Neurol. 2023 Sep 4;14:1251570. doi: 10.3389/fneur.2023.1251570. eCollection 2023.

A Nomogram for Predicting Surgical Timing in Neonates with Necrotizing Enterocolitis.

J Clin Med. 2023 Apr 23;12(9):3062. doi: 10.3390/jcm12093062.

Bi-level structured functional analysis for genome-wide association studies.

Biometrics. 2023 Dec;79(4):3359-3373. doi: 10.1111/biom.13871. Epub 2023 May 7.

A statistical boosting framework for polygenic risk scores based on large-scale genotype data.

Front Genet. 2023 Jan 10;13:1076440. doi: 10.3389/fgene.2022.1076440. eCollection 2022.

A nomogram for predicting postoperative pulmonary infection in esophageal cancer patients.

BMC Pulm Med. 2021 Sep 6;21(1):283. doi: 10.1186/s12890-021-01656-7.

GEE-TGDR: A Longitudinal Feature Selection Algorithm and Its Application to lncRNA Expression Profiles for Psoriasis Patients Treated with Immune Therapies.

Biomed Res Int. 2021 Apr 9;2021:8862895. doi: 10.1155/2021/8862895. eCollection 2021.

Time-varying Hazards Model for Incorporating Irregularly Measured, High-Dimensional Biomarkers.

Stat Sin. 2020 Jul;30(3):1605-1632. doi: 10.5705/ss.202017.0375.

本文引用的文献

Semiparametric Regression Pursuit.

Stat Sin. 2012 Oct 1;22(4):1403-1426. doi: 10.5705/ss.2010.298.

Genetic Analysis Workshop 17 mini-exome simulation.

BMC Proc. 2011 Nov 29;5 Suppl 9(Suppl 9):S2. doi: 10.1186/1753-6561-5-S9-S2.

The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression.

Ann Stat. 2011;39(4):2021-2046. doi: 10.1214/11-aos897.

Regularization Paths for Generalized Linear Models via Coordinate Descent.

J Stat Softw. 2010;33(1):1-22.

Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data.

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S2. doi: 10.1186/1753-6561-3-s7-s2.

Genome-wide association analysis by lasso penalized logistic regression.

Bioinformatics. 2009 Mar 15;25(6):714-21. doi: 10.1093/bioinformatics/btp041. Epub 2009 Jan 28.

A flexible and powerful bayesian hierarchical model for ChIP-Chip experiments.

Biometrics. 2008 Jun;64(2):468-78. doi: 10.1111/j.1541-0420.2007.00899.x. Epub 2007 Sep 20.

Group SCAD regression analysis for microarray time course gene expression data.

Bioinformatics. 2007 Jun 15;23(12):1486-94. doi: 10.1093/bioinformatics/btm125. Epub 2007 Apr 26.

A review of the MHC genetics of rheumatoid arthritis.

Genes Immun. 2004 May;5(3):151-7. doi: 10.1038/sj.gene.6364045.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用平滑群组 Lasso 在全基因组关联研究中纳入群组相关性。

Incorporating group correlations in genome-wide association studies using smoothed group Lasso.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献