Suppr超能文献

海鸥算法:通过近端梯度下降法实现线性回归模型的套索、组套索和稀疏组套索正则化。

Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent.

机构信息

Institute of Genetics and Biometry, Leibniz Institute for Farm Animal Biology, 18196, Dummerstorf, Germany.

Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA.

出版信息

BMC Bioinformatics. 2020 Sep 15;21(1):407. doi: 10.1186/s12859-020-03725-w.

Abstract

BACKGROUND

Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths.

RESULTS

Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature.

CONCLUSIONS

The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.

摘要

背景

生命科学中的生物问题的统计分析通常会导致高维线性模型。为了解决相应的方程组,惩罚方法通常是首选方法。如果解释变量的数量超过观测值的数量,或者出于某些生物学原因,出现多重共线性时,它们特别有用。然后,通过适当的感兴趣的函数来惩罚模型拟合优度。突出的例子是lasso、group lasso 和 sparse-group lasso。在这里,我们通过近端梯度下降为这些运算符提供了快速且数值上便宜的实现。通过 warm starts 实现了针对惩罚参数的网格搜索。通过回溯线搜索确定连续迭代之间的步长。最后,这里介绍的 R 包 seagull 生成完整的正则化路径。

结果

使用公开的高维甲基化数据将 seagull 与成熟的 R 包 SGL 进行比较。这两个包的结果都能够从 DNA 甲基化状态准确预测生物年龄。但是,即使 seagull 和 SGL 的结果非常相似(R>0.99),seagull 的计算时间也只是 SGL 的一小部分。此外,seagull 还可以为每个惩罚特征添加权重。

结论

seagull 中提供了以下用于线性回归模型的运算符:lasso、group lasso、sparse-group lasso 和带有惩罚因子的集成 LASSO(IPF-lasso)。因此,seagull 是 lasso 变体的便捷封装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afe8/7493359/96f5dc72a3f3/12859_2020_3725_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验