Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.
Genet Epidemiol. 2011 Dec;35(8):755-65. doi: 10.1002/gepi.20625. Epub 2011 Sep 15.
In multilocus association analysis, since some markers may not be associated with a trait, it seems attractive to use penalized regression with the capability of automatic variable selection. On the other hand, in spite of a rapidly growing body of literature on penalized regression, most focus on variable selection and outcome prediction, for which penalized methods are generally more effective than their nonpenalized counterparts. However, for statistical inference, i.e. hypothesis testing and interval estimation, it is less clear how penalized methods would perform, or even how to best apply them, largely due to lack of studies on this topic. In our motivating data for a cohort of kidney transplant recipients, it is of primary interest to assess whether a group of genetic variants are associated with a binary clinical outcome, acute rejection at 6 months. In this article, we study some technical issues and alternative implementations of hypothesis testing in Lasso penalized logistic regression, and compare their performance with each other and with several existing global tests, some of which are specifically designed as variance component tests for high-dimensional data. The most interesting, and perhaps surprising, conclusion of this study is that, for low to moderately high-dimensional data, statistical tests based on Lasso penalized regression are not necessarily more powerful than some existing global tests. In addition, in penalized regression, rather than building a test based on a single selected "best" model, combining multiple tests, each of which is built on a candidate model, might be more promising.
在多基因座关联分析中,由于一些标记可能与性状不相关,因此使用具有自动变量选择功能的惩罚回归似乎很有吸引力。另一方面,尽管惩罚回归的文献数量迅速增加,但大多数都集中在变量选择和结果预测上,在这些方面,惩罚方法通常比非惩罚方法更有效。然而,对于统计推断,即假设检验和区间估计,惩罚方法的表现如何,甚至如何最好地应用它们,都不太清楚,这主要是由于缺乏对此主题的研究。在我们对一组肾移植受者队列的激励数据中,主要关注的是评估一组遗传变异是否与 6 个月时的急性排斥反应等二元临床结果相关。在本文中,我们研究了 Lasso 惩罚逻辑回归中假设检验的一些技术问题和替代实现,并将它们的性能彼此进行了比较,也与几种现有的全局检验进行了比较,其中一些检验是专门为高维数据设计的方差分量检验。这项研究最有趣、也许也是最令人惊讶的结论是,对于低到中等维度的数据,基于 Lasso 惩罚回归的统计检验不一定比一些现有的全局检验更有效。此外,在惩罚回归中,与其基于单个选定的“最佳”模型构建检验,不如组合多个检验,每个检验都是基于候选模型构建的,可能更有前途。