通过惩罚逻辑回归进行全基因组和候选基因研究中的 SNP 选择。

SNP selection in genome-wide and candidate gene studies via penalized logistic regression.

机构信息

Institute of Human Genetics, Central Parkway, Newcastle upon Tyne, United Kingdom.

出版信息

Genet Epidemiol. 2010 Dec;34(8):879-91. doi: 10.1002/gepi.20543.

DOI:10.1002/gepi.20543

PMID:21104890

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3410531/

Abstract

Penalized regression methods offer an attractive alternative to single marker testing in genetic association analysis. Penalized regression methods shrink down to zero the coefficient of markers that have little apparent effect on the trait of interest, resulting in a parsimonious subset of what we hope are true pertinent predictors. Here we explore the performance of penalization in selecting SNPs as predictors in genetic association studies. The strength of the penalty can be chosen either to select a good predictive model (via methods such as computationally expensive cross validation), through maximum likelihood-based model selection criterion (such as the BIC), or to select a model that controls for type I error, as done here. We have investigated the performance of several penalized logistic regression approaches, simulating data under a variety of disease locus effect size and linkage disequilibrium patterns. We compared several penalties, including the elastic net, ridge, Lasso, MCP and the normal-exponential-γ shrinkage prior implemented in the hyperlasso software, to standard single locus analysis and simple forward stepwise regression. We examined how markers enter the model as penalties and P-value thresholds are varied, and report the sensitivity and specificity of each of the methods. Results show that penalized methods outperform single marker analysis, with the main difference being that penalized methods allow the simultaneous inclusion of a number of markers, and generally do not allow correlated variables to enter the model, producing a sparse model in which most of the identified explanatory markers are accounted for.

摘要

惩罚回归方法为遗传关联分析中的单标记测试提供了一种有吸引力的替代方法。惩罚回归方法会将对目标性状几乎没有明显影响的标记的系数缩小到零，从而形成一个简约的子集，其中包含我们希望真正相关的预测因子。在这里，我们探讨了惩罚在选择 SNP 作为遗传关联研究中的预测因子方面的性能。惩罚的强度可以通过选择一个好的预测模型来选择（例如通过计算成本高昂的交叉验证等方法），通过基于最大似然的模型选择标准（例如 BIC），或者像这里一样选择一个控制第一类错误的模型。我们研究了几种惩罚逻辑回归方法的性能，模拟了各种疾病基因座效应大小和连锁不平衡模式下的数据。我们比较了几种惩罚，包括弹性网络、岭回归、Lasso、MCP 和在 hyperlasso 软件中实现的正态-指数-γ收缩先验，与标准单基因座分析和简单的向前逐步回归。我们研究了随着惩罚和 P 值阈值的变化，标记如何进入模型，并报告了每种方法的敏感性和特异性。结果表明，惩罚方法优于单标记分析，主要区别在于惩罚方法允许同时包含多个标记，并且通常不允许相关变量进入模型，从而产生一个稀疏的模型，其中大部分识别出的解释标记都被考虑在内。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9ef/3410531/7150b3c65e41/gepi0034-0879-f1.jpg

相似文献

SNP selection in genome-wide and candidate gene studies via penalized logistic regression.

Genet Epidemiol. 2010 Dec;34(8):879-91. doi: 10.1002/gepi.20543.

Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits.

Genetics. 2015 Jan;199(1):205-22. doi: 10.1534/genetics.114.167817. Epub 2014 Oct 28.

Genome-wide association analysis by lasso penalized logistic regression.

Bioinformatics. 2009 Mar 15;25(6):714-21. doi: 10.1093/bioinformatics/btp041. Epub 2009 Jan 28.

Exploiting Linkage Disequilibrium for Ultrahigh-Dimensional Genome-Wide Data with an Integrated Statistical Approach.

Genetics. 2016 Feb;202(2):411-26. doi: 10.1534/genetics.115.179507. Epub 2015 Dec 12.

Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease.

Genet Epidemiol. 2013 Feb;37(2):184-95. doi: 10.1002/gepi.21698. Epub 2012 Nov 30.

Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data.

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btad063.

Hybrid of Restricted and Penalized Maximum Likelihood Method for Efficient Genome-Wide Association Study.

Genes (Basel). 2020 Oct 29;11(11):1286. doi: 10.3390/genes11111286.

A permutation approach for selecting the penalty parameter in penalized model selection.

Biometrics. 2015 Dec;71(4):1185-94. doi: 10.1111/biom.12359. Epub 2015 Aug 3.

Accounting for linkage disequilibrium in genome-wide association studies: A penalized regression method.

Stat Interface. 2013 Jan 1;6(1):99-115. doi: 10.4310/SII.2013.v6.n1.a10.

Penalized regression for genome-wide association screening of sequence data.

Pac Symp Biocomput. 2011:106-17. doi: 10.1142/9789814335058_0012.

引用本文的文献

Development and Validation of Multi-Locus GWAS-Based KASP Markers for Maize Resistance.

Plants (Basel). 2025 Jul 26;14(15):2315. doi: 10.3390/plants14152315.

Using machine learning and single nucleotide polymorphisms for improving rheumatoid arthritis risk Prediction in postmenopausal women.

PLOS Digit Health. 2025 Apr 9;4(4):e0000790. doi: 10.1371/journal.pdig.0000790. eCollection 2025 Apr.

Modulatory Neurotransmitter Genotypes Shape Dynamic Functional Connectome Reconfigurations.

J Neurosci. 2025 Mar 5;45(10):e1939242025. doi: 10.1523/JNEUROSCI.1939-24.2025.

Integrated bioinformatics analysis and experimental animal models identify a robust biomarker and its correlation with the immune microenvironment in pulmonary arterial hypertension.

Heliyon. 2024 Apr 16;10(8):e29587. doi: 10.1016/j.heliyon.2024.e29587. eCollection 2024 Apr 30.

Hierarchical joint analysis of marginal summary statistics-Part I: Multipopulation fine mapping and credible set construction.

Genet Epidemiol. 2024 Sep;48(6):241-257. doi: 10.1002/gepi.22562. Epub 2024 Apr 12.

Environmental and trophic determinism of fruit abscission and outlook with climate change in tropical regions.

Plant Environ Interact. 2020 Apr 22;1(1):17-28. doi: 10.1002/pei3.10011. eCollection 2020 Jun.

A scalable hierarchical lasso for gene-environment interactions.

J Comput Graph Stat. 2022;31(4):1091-1103. doi: 10.1080/10618600.2022.2039161. Epub 2022 Mar 31.

Efficient feature extraction from highly sparse binary genotype data for cancer prognosis prediction using an auto-encoder.

Front Oncol. 2023 Jan 10;12:1091767. doi: 10.3389/fonc.2022.1091767. eCollection 2022.

What predicts people's belief in COVID-19 misinformation? A retrospective study using a nationwide online survey among adults residing in the United States.

BMC Public Health. 2022 Nov 18;22(1):2114. doi: 10.1186/s12889-022-14431-y.

Genes, exposures, and interactions on preterm birth risk: an exploratory study in an Argentine population.

J Community Genet. 2022 Dec;13(6):557-565. doi: 10.1007/s12687-022-00605-z. Epub 2022 Aug 17.

本文引用的文献

Regularization Paths for Generalized Linear Models via Coordinate Descent.

J Stat Softw. 2010;33(1):1-22.

Analysis of North American Rheumatoid Arthritis Consortium data using a penalized logistic regression approach.

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S61. doi: 10.1186/1753-6561-3-s7-s61.

Elastic-net regularization approaches for genome-wide association studies of rheumatoid arthritis.

BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S25. doi: 10.1186/1753-6561-3-s7-s25.

L1 penalized estimation in the Cox proportional hazards model.

Biom J. 2010 Feb;52(1):70-84. doi: 10.1002/bimj.200900028.

Genome-wide association analysis by lasso penalized logistic regression.

Bioinformatics. 2009 Mar 15;25(6):714-21. doi: 10.1093/bioinformatics/btp041. Epub 2009 Jan 28.

Fregene: simulation of realistic sequence-level data in populations and ascertained samples.

BMC Bioinformatics. 2008 Sep 8;9:364. doi: 10.1186/1471-2105-9-364.

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.

PLoS Genet. 2008 Jul 25;4(7):e1000130. doi: 10.1371/journal.pgen.1000130.

Linkage and association analysis of GAW15 simulated data: fine-mapping of chromosome 6 region.

BMC Proc. 2007;1 Suppl 1(Suppl 1):S23. doi: 10.1186/1753-6561-1-s1-s23. Epub 2007 Dec 18.

Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes.

Nat Genet. 2008 May;40(5):638-45. doi: 10.1038/ng.120. Epub 2008 Mar 30.

Accommodating linkage disequilibrium in genetic-association analyses via ridge regression.

Am J Hum Genet. 2008 Feb;82(2):375-85. doi: 10.1016/j.ajhg.2007.10.012.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过惩罚逻辑回归进行全基因组和候选基因研究中的 SNP 选择。

SNP selection in genome-wide and candidate gene studies via penalized logistic regression.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献