Miller Ryan, Breheny Patrick
Department of Mathematics, Grinnell College, Grinnell, Iowa, USA.
Department of Biostatistics, University of Iowa, Iowa City, Iowa, USA.
Stat Med. 2023 Apr 30;42(9):1412-1429. doi: 10.1002/sim.9678. Epub 2023 Feb 3.
Penalized regression methods such as the lasso are a popular approach to analyzing high-dimensional data. One attractive property of the lasso is that it naturally performs variable selection. An important area of concern, however, is the reliability of these selections. Motivated by local false discovery rate methodology from the large-scale hypothesis testing literature, we propose a method for calculating a local false discovery rate for each variable under consideration by the lasso model. These rates can be used to assess the reliability of an individual feature, or to estimate the model's overall false discovery rate. The method can be used for any level of regularization. This is particularly useful for models with a few highly significant features but a high overall false discovery rate, a relatively common occurrence when using cross validation to select a model. It is also flexible enough to be applied to many varieties of penalized likelihoods including generalized linear models and Cox regression, and a variety of penalties, including the minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) penalty. We demonstrate the validity of this approach and contrast it with other inferential methods for penalized regression as well as with local false discovery rates for univariate hypothesis tests. Finally, we show the practical utility of our method by applying it to a case study involving gene expression in breast cancer patients.
诸如套索回归等惩罚回归方法是分析高维数据的常用方法。套索回归的一个吸引人的特性是它能自然地进行变量选择。然而,一个重要的关注点是这些选择的可靠性。受大规模假设检验文献中的局部错误发现率方法的启发,我们提出了一种方法,用于计算套索模型所考虑的每个变量的局部错误发现率。这些比率可用于评估单个特征的可靠性,或估计模型的总体错误发现率。该方法可用于任何正则化水平。这对于具有一些高度显著特征但总体错误发现率较高的模型特别有用,在使用交叉验证选择模型时,这种情况相对常见。它也足够灵活,可应用于许多类型的惩罚似然,包括广义线性模型和Cox回归,以及各种惩罚,包括最小最大凹惩罚(MCP)和平滑截断绝对偏差(SCAD)惩罚。我们证明了这种方法的有效性,并将其与惩罚回归的其他推断方法以及单变量假设检验的局部错误发现率进行了对比。最后,我们通过将其应用于一个涉及乳腺癌患者基因表达的案例研究来展示我们方法的实际效用。