Suppr超能文献

使用套索人工神经网络在非线性干草堆中寻找针的相变。

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks.

作者信息

Ma Xiaoyu, Sardy Sylvain, Hengartner Nick, Bobenko Nikolai, Lin Yen Ting

机构信息

Shandong University, Jinan, China.

Department of Mathematics, University of Geneva, Geneva, Switzerland.

出版信息

Stat Comput. 2022;32(6):99. doi: 10.1007/s11222-022-10169-0. Epub 2022 Oct 22.

Abstract

To fit sparse linear associations, a LASSO sparsity inducing penalty with a single hyperparameter provably allows to recover the important features (needles) with high probability in certain regimes even if the sample size is smaller than the dimension of the input vector (haystack). More recently learners known as artificial neural networks (ANN) have shown great successes in many machine learning tasks, in particular fitting nonlinear associations. Small learning rate, stochastic gradient descent algorithm and large training set help to cope with the explosion in the number of parameters present in deep neural networks. Yet few ANN learners have been developed and studied to find needles in nonlinear haystacks. Driven by a single hyperparameter, our ANN learner, like for sparse linear associations, exhibits a phase transition in the probability of retrieving the needles, which we do not observe with other ANN learners. To select our penalty parameter, we generalize the universal threshold of Donoho and Johnstone (Biometrika 81(3):425-455, 1994) which is a better rule than the conservative (too many false detections) and expensive cross-validation. In the spirit of simulated annealing, we propose a warm-start sparsity inducing algorithm to solve the high-dimensional, non-convex and non-differentiable optimization problem. We perform simulated and real data Monte Carlo experiments to quantify the effectiveness of our approach.

摘要

为了拟合稀疏线性关联,带有单个超参数的套索(LASSO)稀疏诱导惩罚在某些情况下即使样本量小于输入向量(干草堆)的维度,也能大概率地恢复重要特征(针)。最近,被称为人工神经网络(ANN)的学习器在许多机器学习任务中取得了巨大成功,特别是在拟合非线性关联方面。小学习率、随机梯度下降算法和大训练集有助于应对深度神经网络中参数数量的爆炸式增长。然而,很少有ANN学习器被开发和研究用于在非线性干草堆中找到针。由单个超参数驱动,我们的ANN学习器与稀疏线性关联的情况类似,在检索针的概率上表现出相变,而其他ANN学习器没有观察到这种情况。为了选择我们的惩罚参数,我们推广了多诺霍(Donoho)和约翰斯通(Johnstone)的通用阈值(《生物统计学》81(3):425 - 455,1994年),这是一个比保守的(错误检测过多)且昂贵的交叉验证更好的规则。本着模拟退火的精神,我们提出一种热启动稀疏诱导算法来解决高维、非凸且不可微的优化问题。我们进行了模拟和真实数据的蒙特卡罗实验来量化我们方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a765/9587964/0312a3654cff/11222_2022_10169_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验