高维回归和分类中具有外部协变量的变分贝叶斯自适应惩罚。

Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes.

机构信息

Genome Biology Unit, European Molecular Biology Laboratory, Meyerhofstr. 1, 69117 Heidelberg, Germany.

出版信息

Biostatistics. 2021 Apr 10;22(2):348-364. doi: 10.1093/biostatistics/kxz034.

DOI:10.1093/biostatistics/kxz034

PMID:31596468

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8036004/

Abstract

Penalization schemes like Lasso or ridge regression are routinely used to regress a response of interest on a high-dimensional set of potential predictors. Despite being decisive, the question of the relative strength of penalization is often glossed over and only implicitly determined by the scale of individual predictors. At the same time, additional information on the predictors is available in many applications but left unused. Here, we propose to make use of such external covariates to adapt the penalization in a data-driven manner. We present a method that differentially penalizes feature groups defined by the covariates and adapts the relative strength of penalization to the information content of each group. Using techniques from the Bayesian tool-set our procedure combines shrinkage with feature selection and provides a scalable optimization scheme. We demonstrate in simulations that the method accurately recovers the true effect sizes and sparsity patterns per feature group. Furthermore, it leads to an improved prediction performance in situations where the groups have strong differences in dynamic range. In applications to data from high-throughput biology, the method enables re-weighting the importance of feature groups from different assays. Overall, using available covariates extends the range of applications of penalized regression, improves model interpretability and can improve prediction performance.

摘要

惩罚方案，如 Lasso 或岭回归，常用于将感兴趣的响应回归到一组高维的潜在预测因子上。尽管这种方法很有决断性，但惩罚力度的相对强度问题往往被忽略，只是通过个别预测因子的规模来隐含确定。与此同时，许多应用中都有关于预测因子的额外信息，但没有被利用。在这里，我们建议利用这些外部协变量以数据驱动的方式自适应惩罚。我们提出了一种方法，该方法根据协变量对特征组进行差异化惩罚，并根据每个组的信息量自适应调整惩罚的相对强度。我们的方法利用贝叶斯工具集中的技术，将收缩与特征选择相结合，并提供了一种可扩展的优化方案。我们在模拟中证明，该方法可以准确地恢复每个特征组的真实效应大小和稀疏模式。此外，在各组动态范围差异较大的情况下，它可以提高预测性能。在应用于高通量生物学数据时，该方法能够重新加权来自不同检测的特征组的重要性。总的来说，利用可用的协变量扩展了惩罚回归的应用范围，提高了模型的可解释性，并可以提高预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e1a/8036004/cea527812e86/kxz034f1.jpg

相似文献

Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes.高维回归和分类中具有外部协变量的变分贝叶斯自适应惩罚。

Biostatistics. 2021 Apr 10;22(2):348-364. doi: 10.1093/biostatistics/kxz034.

Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data.弹性 SCAD 作为一种新的惩罚方法，用于高维数据中的 SVM 分类任务。

BMC Bioinformatics. 2011 May 9;12:138. doi: 10.1186/1471-2105-12-138.

Adaptive group-regularized logistic elastic net regression.自适应群组正则化逻辑弹性网回归。

Biostatistics. 2021 Oct 13;22(4):723-737. doi: 10.1093/biostatistics/kxz062.

Incorporating scientific knowledge into phenotype development: penalized latent class regression.将科学知识纳入表型发育中：惩罚潜在类别回归。

Stat Med. 2011 Mar 30;30(7):784-98. doi: 10.1002/sim.4137. Epub 2010 Dec 5.

Penalized binary regression for gene expression profiling.用于基因表达谱分析的惩罚二元回归

Methods Inf Med. 2004;43(5):439-44.

Flexible co-data learning for high-dimensional prediction.高维预测的灵活协同数据学习。

Stat Med. 2021 Nov 20;40(26):5910-5925. doi: 10.1002/sim.9162. Epub 2021 Aug 26.

Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening.具有肿瘤特征分析的套索惩罚 Cox 模型的预后可提高预测准确性，优于仅使用临床数据的预测，并且受益于二维预筛选。

BMC Cancer. 2022 Oct 5;22(1):1045. doi: 10.1186/s12885-022-10117-1.

Interquantile Shrinkage and Variable Selection in Quantile Regression.分位数回归中的分位数间收缩与变量选择

Comput Stat Data Anal. 2014 Jan 1;69:208-219. doi: 10.1016/j.csda.2013.08.006.

Comparing Bayesian Variable Selection to Lasso Approaches for Applications in Psychology.比较贝叶斯变量选择与套索方法在心理学中的应用

Psychometrika. 2023 Sep;88(3):1032-1055. doi: 10.1007/s11336-023-09914-9. Epub 2023 May 23.

The reciprocal Bayesian LASSO.双向贝叶斯 LASSO。

Stat Med. 2021 Sep 30;40(22):4830-4849. doi: 10.1002/sim.9098. Epub 2021 Jun 14.

引用本文的文献

Leveraging external information by guided adaptive shrinkage to improve variable selection in high-dimensional regression settings.通过引导式自适应收缩利用外部信息以改善高维回归设置中的变量选择。

Int J Biostat. 2025 Sep 8. doi: 10.1515/ijb-2024-0108.

Adaptive Use of Co-Data Through Empirical Bayes for Bayesian Additive Regression Trees.通过经验贝叶斯对协数据进行自适应使用以用于贝叶斯加法回归树

Stat Med. 2025 Feb 28;44(5):e70004. doi: 10.1002/sim.70004.

A probabilistic modeling framework for genomic networks incorporating sample heterogeneity.一种纳入样本异质性的基因组网络概率建模框架。

Cell Rep Methods. 2025 Feb 24;5(2):100984. doi: 10.1016/j.crmeth.2025.100984. Epub 2025 Feb 14.

Fast Marginal Likelihood Estimation of Penalties for Group-Adaptive Elastic Net.分组自适应弹性网络惩罚项的快速边际似然估计

J Comput Graph Stat. 2022 Nov 9;32(3):950-960. doi: 10.1080/10618600.2022.2128809. eCollection 2023.

Feature-weighted elastic net: using "features of features" for better prediction.特征加权弹性网络：利用“特征的特征”实现更好的预测。

Stat Sin. 2023 Jan;33(1):259-279. doi: 10.5705/ss.202020.0226.

ecpc: an R-package for generic co-data models for high-dimensional prediction.ecpc：用于高维预测的通用协数据模型的 R 包。

BMC Bioinformatics. 2023 Apr 26;24(1):172. doi: 10.1186/s12859-023-05289-x.

Flexible co-data learning for high-dimensional prediction.高维预测的灵活协同数据学习。

Stat Med. 2021 Nov 20;40(26):5910-5925. doi: 10.1002/sim.9162. Epub 2021 Aug 26.

A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research.一种详细的多组学方法目录，用于鉴定转化癌症研究中的潜在生物标志物和因果分子网络。

Int J Mol Sci. 2021 Mar 10;22(6):2822. doi: 10.3390/ijms22062822.

Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions.将机械免疫学知识整合到机器学习流程中可改善预测效果。

Nat Mach Intell. 2020 Oct;2(10):619-628. doi: 10.1038/s42256-020-00232-8. Epub 2020 Oct 12.

Design and Rationale of the ERA-CVD Consortium PREMED-CAD-Precision Medicine in Coronary Artery Disease.ERA-CVD 联盟 PREMED-CAD-冠心病精准医学研究的设计与原理。

Biomolecules. 2020 Jan 11;10(1):125. doi: 10.3390/biom10010125.

本文引用的文献

Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets.多组学因子分析——一种用于无监督整合多组学数据集的框架。

Mol Syst Biol. 2018 Jun 20;14(6):e8124. doi: 10.15252/msb.20178124.

Drug-perturbation-based stratification of blood cancer.基于药物扰动的血液肿瘤分层。

J Clin Invest. 2018 Jan 2;128(1):427-445. doi: 10.1172/JCI93801. Epub 2017 Dec 11.

IPF-LASSO: Integrative -Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data.IPF-LASSO：基于多组学数据的带惩罚因子的整合惩罚回归用于预测

Comput Math Methods Med. 2017;2017:7691937. doi: 10.1155/2017/7691937. Epub 2017 May 4.

Multi-omics approaches to disease.疾病的多组学方法

Genome Biol. 2017 May 5;18(1):83. doi: 10.1186/s13059-017-1215-1.

Reproducible RNA-seq analysis using recount2.使用recount2进行可重复的RNA测序分析。

Nat Biotechnol. 2017 Apr 11;35(4):319-321. doi: 10.1038/nbt.3838.

DegreeCox - a network-based regularization method for survival analysis.DegreeCox——一种用于生存分析的基于网络的正则化方法。

BMC Bioinformatics. 2016 Dec 13;17(Suppl 16):449. doi: 10.1186/s12859-016-1310-4.

Data-driven hypothesis weighting increases detection power in genome-scale multiple testing.数据驱动的假设加权提高了基因组规模多重检验中的检测能力。

Nat Methods. 2016 Jul;13(7):577-80. doi: 10.1038/nmeth.3885. Epub 2016 May 30.

Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method.通过先验套索方法对广义线性模型进行带先验信息的变量选择

J Am Stat Assoc. 2016;111(513):355-376. doi: 10.1080/01621459.2015.1008363. Epub 2016 May 5.

Optimal multiple testing under a Gaussian prior on the effect sizes.效应量的高斯先验下的最优多重检验。

Biometrika. 2015 Dec;102(4):753-766. doi: 10.1093/biomet/asv050. Epub 2015 Nov 4.

Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases.人类多种组织中与年龄相关的基因表达同步变化及其与复杂疾病的关联。

Sci Rep. 2015 Oct 19;5:15145. doi: 10.1038/srep15145.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

高维回归和分类中具有外部协变量的变分贝叶斯自适应惩罚。

Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献