折叠凹惩罚稀疏线性回归：局部解的稀疏性、统计性能和算法理论

Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions.

作者信息

Liu Hongcheng, Yao Tao, Li Runze, Ye Yinyu

机构信息

Harold and Inge Marcus Department of Industrial and Manufacturing Engineering, The Pennsylvania State University, University Park PA 16802, USA.

Department of Statistics and the Methodology Center, The Pennsylvania State University, University Park PA 16802, USA.

出版信息

Math Program. 2017 Nov;166(1-2):207-240. doi: 10.1007/s10107-017-1114-y. Epub 2017 Feb 10.

DOI:10.1007/s10107-017-1114-y

PMID:29225375

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5720392/

Abstract

This paper concerns the (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (i) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (ii) Perhaps more importantly, any local solution satisfying a (SONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the SONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (iii) We apply (ii) to the special case of FCPSLR with (MCP) and show that under the condition, any SONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee the SONC admits FPTAS.

摘要

本文关注一类流行的稀疏恢复方法——折叠凹惩罚稀疏线性回归（FCPSLR）。尽管FCPSLR在全局求解时能产生理想的恢复性能，但计算全局解是NP完全问题。尽管已有一些关于局部极小值或基于特定FCPSLR的学习算法的统计性能分析，但局部解（已知其允许完全多项式时间近似方案（FPTAS））是否足以确保统计性能，以及该统计性能是否不依赖于计算过程的具体设计，仍然是悬而未决的问题。为解决这些问题，本文给出了以下三方面的结果：（i）在折叠凹惩罚参数的某些条件下，任何局部解（驻点）都是稀疏估计器。（ii）也许更重要的是，任何满足二阶非凸约束（SONC）（比二阶KKT条件弱）的局部解，以高概率在逼近真实参数时产生有界误差。此外，如果最小信号强度足够，SONC解可能恢复神谕解。该结果还表明，提高统计性能的目标与在求解FCPSLR的非凸规划公式时最小化次优差距的优化标准是一致的。（iii）我们将（ii）应用于具有最小凹惩罚（MCP）的FCPSLR的特殊情况，并表明在该条件下，任何目标值比套索解更好的SONC解都具有强神谕性质。此外，在样本量足够的情况下，这样的解产生的最大估计误差（ME）与最优但指数时间的稀疏估计器相当，而最坏情况下的ME通常与套索相当。此外，为保证SONC允许FPTAS。