Lemhadri Ismael, Ruan Feng, Tibshirani Robert
Stanford University.
Proc Mach Learn Res. 2021 Apr;130:10-18.
Much work has been done recently to make neural networks more interpretable, and one approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or -regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we introduce LassoNet, a neural network framework with global feature selection. Our approach achieves feature sparsity by allowing a feature to participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, our method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. In experiments with real and simulated data, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks. It can be implemented by adding just a few lines of code to a standard neural network.
最近人们做了很多工作来使神经网络更具可解释性,一种方法是让网络仅使用可用特征的一个子集。在线性模型中,套索(或L1正则化)回归将零权重赋给最不相关或冗余的特征,并且在数据科学中被广泛使用。然而,套索仅适用于线性模型。在这里,我们引入LassoNet,一个具有全局特征选择的神经网络框架。我们的方法通过仅在其线性表示活跃时才允许一个特征参与隐藏单元来实现特征稀疏性。与神经网络特征选择的其他方法不同,我们的方法使用带有约束的修改目标函数,因此直接将特征选择与参数学习集成在一起。结果,它提供了具有一系列特征稀疏性的整个正则化解决方案路径。在真实数据和模拟数据的实验中,LassoNet在特征选择和回归方面显著优于当前最先进的方法。LassoNet方法使用投影近端梯度下降,并且可以直接推广到深度网络。它可以通过在标准神经网络中添加几行代码来实现。