• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

岭回归在预测问题中的应用:岭参数的自动选择。

Ridge regression in prediction problems: automatic choice of the ridge parameter.

机构信息

Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom; Statistical Consulting Group, GlaxoSmithKline, Stevenage, United Kingdom.

出版信息

Genet Epidemiol. 2013 Nov;37(7):704-14. doi: 10.1002/gepi.21750. Epub 2013 Jul 26.

DOI:10.1002/gepi.21750
PMID:23893343
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4377081/
Abstract

To date, numerous genetic variants have been identified as associated with diverse phenotypic traits. However, identified associations generally explain only a small proportion of trait heritability and the predictive power of models incorporating only known-associated variants has been small. Multiple regression is a popular framework in which to consider the joint effect of many genetic variants simultaneously. Ordinary multiple regression is seldom appropriate in the context of genetic data, due to the high dimensionality of the data and the correlation structure among the predictors. There has been a resurgence of interest in the use of penalised regression techniques to circumvent these difficulties. In this paper, we focus on ridge regression, a penalised regression approach that has been shown to offer good performance in multivariate prediction problems. One challenge in the application of ridge regression is the choice of the ridge parameter that controls the amount of shrinkage of the regression coefficients. We present a method to determine the ridge parameter based on the data, with the aim of good performance in high-dimensional prediction problems. We establish a theoretical justification for our approach, and demonstrate its performance on simulated genetic data and on a real data example. Fitting a ridge regression model to hundreds of thousands to millions of genetic variants simultaneously presents computational challenges. We have developed an R package, ridge, which addresses these issues. Ridge implements the automatic choice of ridge parameter presented in this paper, and is freely available from CRAN.

摘要

迄今为止,已经发现许多遗传变异与各种表型特征有关。然而,已确定的关联通常仅能解释特征遗传率的一小部分,并且仅包含已知相关变异的模型的预测能力也很小。多元回归是一个常用的框架,可以同时考虑许多遗传变异的联合效应。由于数据的高维度和预测变量之间的相关结构,普通多元回归在遗传数据的背景下很少适用。已经重新兴起了使用惩罚回归技术来规避这些困难的兴趣。在本文中,我们专注于岭回归,这是一种惩罚回归方法,已被证明在多元预测问题中具有良好的性能。岭回归应用中的一个挑战是选择控制回归系数收缩量的岭参数。我们提出了一种基于数据确定岭参数的方法,旨在在高维预测问题中取得良好的性能。我们为我们的方法提供了理论依据,并在模拟遗传数据和真实数据示例上证明了其性能。同时拟合数十万到数百万个遗传变异的岭回归模型会带来计算上的挑战。我们已经开发了一个名为 ridge 的 R 包,可以解决这些问题。Ridge 实现了本文中提出的自动选择岭参数的方法,并可从 CRAN 免费获得。

相似文献

1
Ridge regression in prediction problems: automatic choice of the ridge parameter.岭回归在预测问题中的应用:岭参数的自动选择。
Genet Epidemiol. 2013 Nov;37(7):704-14. doi: 10.1002/gepi.21750. Epub 2013 Jul 26.
2
Genetic prediction of quantitative lipid traits: comparing shrinkage models to gene scores.遗传预测定量脂质特征:比较收缩模型与基因评分。
Genet Epidemiol. 2014 Jan;38(1):72-83. doi: 10.1002/gepi.21777. Epub 2013 Nov 23.
3
Significance testing in ridge regression for genetic data.遗传数据岭回归中的显著性检验。
BMC Bioinformatics. 2011 Sep 19;12:372. doi: 10.1186/1471-2105-12-372.
4
The Current and Future Use of Ridge Regression for Prediction in Quantitative Genetics.岭回归在数量遗传学预测中的当前及未来应用
Biomed Res Int. 2015;2015:143712. doi: 10.1155/2015/143712. Epub 2015 Jul 26.
5
Efficient Implementation of Penalized Regression for Genetic Risk Prediction.高效实现基于惩罚回归的遗传风险预测。
Genetics. 2019 May;212(1):65-74. doi: 10.1534/genetics.119.302019. Epub 2019 Feb 26.
6
How powerful are summary-based methods for identifying expression-trait associations under different genetic architectures?基于汇总数据的方法在不同遗传结构下识别表达性状关联的能力有多强?
Pac Symp Biocomput. 2018;23:228-239.
7
Prediction of complex human traits using the genomic best linear unbiased predictor.利用基因组最佳线性无偏预测器预测复杂人类特征。
PLoS Genet. 2013;9(7):e1003608. doi: 10.1371/journal.pgen.1003608. Epub 2013 Jul 11.
8
A multilevel model to address batch effects in copy number estimation using SNP arrays.利用 SNP 芯片解决拷贝数估计中批次效应的多层模型。
Biostatistics. 2011 Jan;12(1):33-50. doi: 10.1093/biostatistics/kxq043. Epub 2010 Jul 12.
9
Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach.利用 GWAS 汇总数据和自适应检验方法整合多种性状,以检测新的性状-基因关联。
Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961.
10
Pleiotropy informed adaptive association test of multiple traits using genome-wide association study summary data.利用全基因组关联研究汇总数据进行多性状的多效性知情适应性关联测试。
Biometrics. 2019 Dec;75(4):1076-1085. doi: 10.1111/biom.13076. Epub 2019 Aug 2.

引用本文的文献

1
Lagged precipitation effects on plant production across terrestrial biomes.滞后降水对陆地生物群落植物生产的影响。
Nat Ecol Evol. 2025 Jul 28. doi: 10.1038/s41559-025-02806-4.
2
Evaluating Genetic Regulators of MicroRNAs Using Machine Learning Models.使用机器学习模型评估微小RNA的基因调控因子
Int J Mol Sci. 2025 Jun 16;26(12):5757. doi: 10.3390/ijms26125757.
3
Reduced myelin contributes to cognitive impairment in patients with monogenic small vessel disease.髓鞘减少导致单基因小血管病患者出现认知障碍。

本文引用的文献

1
Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease.惩罚和非惩罚方法在人类复杂疾病遗传预测中的性能和稳健性。
Genet Epidemiol. 2013 Feb;37(2):184-95. doi: 10.1002/gepi.21698. Epub 2012 Nov 30.
2
Hierarchical Naive Bayes for genetic association studies.层次贝叶斯在遗传关联研究中的应用。
BMC Bioinformatics. 2012;13 Suppl 14(Suppl 14):S6. doi: 10.1186/1471-2105-13-S14-S6. Epub 2012 Sep 7.
3
A novel method to identify high order gene-gene interactions in genome-wide association studies: gene-based MDR.
Alzheimers Dement. 2025 May;21(5):e70127. doi: 10.1002/alz.70127.
4
Prediction of inhibitory peptides against E.coli with desired MIC value.预测具有所需最低抑菌浓度(MIC)值的抗大肠杆菌抑制肽。
Sci Rep. 2025 Feb 8;15(1):4672. doi: 10.1038/s41598-025-86638-z.
5
Terminal differentiation and persistence of effector regulatory T cells essential for preventing intestinal inflammation.效应调节性T细胞的终末分化和持久性对于预防肠道炎症至关重要。
Nat Immunol. 2025 Mar;26(3):444-458. doi: 10.1038/s41590-024-02075-6. Epub 2025 Feb 4.
6
Methodologies underpinning polygenic risk scores estimation: a comprehensive overview.多基因风险评分估计的方法学基础:全面综述。
Hum Genet. 2024 Nov;143(11):1265-1280. doi: 10.1007/s00439-024-02710-0. Epub 2024 Oct 19.
7
Exploring the Associations of Inflammatory and Oxidative Stress Biomarkers with Pancreatic Diseases: An Observational and Mendelian Randomisation Study.探索炎症和氧化应激生物标志物与胰腺疾病的关联:一项观察性和孟德尔随机化研究。
J Clin Med. 2024 Apr 12;13(8):2247. doi: 10.3390/jcm13082247.
8
Predictors of mental health problems during the COVID-19 outbreak in Egypt in 2021.2021 年埃及 COVID-19 疫情期间心理健康问题的预测因素。
Front Public Health. 2023 Nov 9;11:1234201. doi: 10.3389/fpubh.2023.1234201. eCollection 2023.
9
Effect of imbalance in dietary macronutrients on blood hemoglobin levels: a cross-sectional study in young underweight Japanese women.膳食常量营养素失衡对血红蛋白水平的影响:一项针对日本年轻体重过轻女性的横断面研究。
Front Nutr. 2023 Jun 20;10:1121717. doi: 10.3389/fnut.2023.1121717. eCollection 2023.
10
Multivariate Sequential Analytics for Cardiovascular Disease Event Prediction.多变量序贯分析在心血管疾病事件预测中的应用。
Methods Inf Med. 2022 Dec;61(S 02):e149-e171. doi: 10.1055/s-0042-1758687. Epub 2022 Dec 23.
一种在全基因组关联研究中识别高阶基因-基因相互作用的新方法:基于基因的多变量数据分析。
BMC Bioinformatics. 2012 Jun 11;13 Suppl 9(Suppl 9):S5. doi: 10.1186/1471-2105-13-S9-S5.
4
Whole-genome regression and prediction methods applied to plant and animal breeding.全基因组回归和预测方法在动植物育种中的应用。
Genetics. 2013 Feb;193(2):327-45. doi: 10.1534/genetics.112.143313. Epub 2012 Jun 28.
5
Genome-wide searches for bipolar disorder genes.全基因组搜索双相情感障碍基因。
Curr Psychiatry Rep. 2011 Dec;13(6):522-7. doi: 10.1007/s11920-011-0226-y.
6
Beyond missing heritability: prediction of complex traits.超越遗传缺失:复杂性状的预测。
PLoS Genet. 2011 Apr;7(4):e1002051. doi: 10.1371/journal.pgen.1002051. Epub 2011 Apr 28.
7
SNP selection in genome-wide and candidate gene studies via penalized logistic regression.通过惩罚逻辑回归进行全基因组和候选基因研究中的 SNP 选择。
Genet Epidemiol. 2010 Dec;34(8):879-91. doi: 10.1002/gepi.20543.
8
Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径
J Stat Softw. 2010;33(1):1-22.
9
Common SNPs explain a large proportion of the heritability for human height.常见的单核苷酸多态性解释了人类身高遗传的很大一部分。
Nat Genet. 2010 Jul;42(7):565-9. doi: 10.1038/ng.608. Epub 2010 Jun 20.
10
Missing heritability and strategies for finding the underlying causes of complex disease.复杂疾病遗传率缺失及其潜在病因的研究策略。
Nat Rev Genet. 2010 Jun;11(6):446-50. doi: 10.1038/nrg2809.