Gunes Funda, Bondell Howard D
Department of Statistics, North Carolina State University.
J Comput Graph Stat. 2012;21(2):295-314. doi: 10.1080/10618600.2012.679890. Epub 2012 Jun 14.
We develop an approach to tuning of penalized regression variable selection methods by calculating the sparsest estimator contained in a confidence region of a specified level. Because confidence intervals/regions are generally understood, tuning penalized regression methods in this way is intuitive and more easily understood by scientists and practitioners. More importantly, our work shows that tuning to a fixed confidence level often performs better than tuning via the common methods based on AIC, BIC, or cross-validation (CV) over a wide range of sample sizes and levels of sparsity. Additionally, we prove that by tuning with a sequence of confidence levels converging to one, asymptotic selection consistency is obtained; and with a simple two-stage procedure, an oracle property is achieved. The confidence region based tuning parameter is easily calculated using output from existing penalized regression computer packages.Our work also shows how to map any penalty parameter to a corresponding confidence coefficient. This mapping facilitates comparisons of tuning parameter selection methods such as AIC, BIC and CV, and reveals that the resulting tuning parameters correspond to confidence levels that are extremely low, and can vary greatly across data sets. Supplemental materials for the article are available online.
我们开发了一种通过计算指定水平置信区域内包含的最稀疏估计量来调整惩罚回归变量选择方法的方法。由于置信区间/区域通常是被理解的,以这种方式调整惩罚回归方法直观且更容易被科学家和从业者理解。更重要的是,我们的工作表明,在广泛的样本量和稀疏水平范围内,调整到固定的置信水平通常比通过基于AIC、BIC或交叉验证(CV)的常用方法进行调整表现更好。此外,我们证明,通过用收敛到1的一系列置信水平进行调整,可以获得渐近选择一致性;并且通过一个简单的两阶段过程,可以实现一种神谕性质。基于置信区域的调整参数可以很容易地使用现有惩罚回归计算机软件包的输出进行计算。我们的工作还展示了如何将任何惩罚参数映射到相应的置信系数。这种映射有助于比较诸如AIC、BIC和CV等调整参数选择方法,并揭示出由此产生的调整参数对应于极低的置信水平,并且在不同数据集之间可能有很大差异。本文的补充材料可在线获取。