Suppr超能文献

高维回归和分类中具有外部协变量的变分贝叶斯自适应惩罚。

Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes.

机构信息

Genome Biology Unit, European Molecular Biology Laboratory, Meyerhofstr. 1, 69117 Heidelberg, Germany.

出版信息

Biostatistics. 2021 Apr 10;22(2):348-364. doi: 10.1093/biostatistics/kxz034.

Abstract

Penalization schemes like Lasso or ridge regression are routinely used to regress a response of interest on a high-dimensional set of potential predictors. Despite being decisive, the question of the relative strength of penalization is often glossed over and only implicitly determined by the scale of individual predictors. At the same time, additional information on the predictors is available in many applications but left unused. Here, we propose to make use of such external covariates to adapt the penalization in a data-driven manner. We present a method that differentially penalizes feature groups defined by the covariates and adapts the relative strength of penalization to the information content of each group. Using techniques from the Bayesian tool-set our procedure combines shrinkage with feature selection and provides a scalable optimization scheme. We demonstrate in simulations that the method accurately recovers the true effect sizes and sparsity patterns per feature group. Furthermore, it leads to an improved prediction performance in situations where the groups have strong differences in dynamic range. In applications to data from high-throughput biology, the method enables re-weighting the importance of feature groups from different assays. Overall, using available covariates extends the range of applications of penalized regression, improves model interpretability and can improve prediction performance.

摘要

惩罚方案,如 Lasso 或岭回归,常用于将感兴趣的响应回归到一组高维的潜在预测因子上。尽管这种方法很有决断性,但惩罚力度的相对强度问题往往被忽略,只是通过个别预测因子的规模来隐含确定。与此同时,许多应用中都有关于预测因子的额外信息,但没有被利用。在这里,我们建议利用这些外部协变量以数据驱动的方式自适应惩罚。我们提出了一种方法,该方法根据协变量对特征组进行差异化惩罚,并根据每个组的信息量自适应调整惩罚的相对强度。我们的方法利用贝叶斯工具集中的技术,将收缩与特征选择相结合,并提供了一种可扩展的优化方案。我们在模拟中证明,该方法可以准确地恢复每个特征组的真实效应大小和稀疏模式。此外,在各组动态范围差异较大的情况下,它可以提高预测性能。在应用于高通量生物学数据时,该方法能够重新加权来自不同检测的特征组的重要性。总的来说,利用可用的协变量扩展了惩罚回归的应用范围,提高了模型的可解释性,并可以提高预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e1a/8036004/cea527812e86/kxz034f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验