Jiang Yuan, He Yunxiao, Zhang Heping
Yuan Jiang is an assistant professor at Department of Statistics, Oregon State University, Corvallis, Oregon 97331-4606. Yunxiao He is an associate director at the Nielsen Company, 770 Broadway, New York, New York 10003-9595. Heping Zhang is a Susan Dwight Bliss Professor at Department of Biostatistics, Yale University School of Public Health, and a Professor at the Child Study Center, Yale University School of Medicine, New Haven, Connecticut 06520-8034. He is also a Chang-Jiang and 1000-plan scholar at Sun Yat-Sen University, Guangzhou, China.
J Am Stat Assoc. 2016;111(513):355-376. doi: 10.1080/01621459.2015.1008363. Epub 2016 May 5.
LASSO is a popular statistical tool often used in conjunction with generalized linear models that can simultaneously select variables and estimate parameters. When there are many variables of interest, as in current biological and biomedical studies, the power of LASSO can be limited. Fortunately, so much biological and biomedical data have been collected and they may contain useful information about the importance of certain variables. This paper proposes an extension of LASSO, namely, prior LASSO (pLASSO), to incorporate that prior information into penalized generalized linear models. The goal is achieved by adding in the LASSO criterion function an additional measure of the discrepancy between the prior information and the model. For linear regression, the whole solution path of the pLASSO estimator can be found with a procedure similar to the Least Angle Regression (LARS). Asymptotic theories and simulation results show that pLASSO provides significant improvement over LASSO when the prior information is relatively accurate. When the prior information is less reliable, pLASSO shows great robustness to the misspecification. We illustrate the application of pLASSO using a real data set from a genome-wide association study.
套索(LASSO)是一种常用的统计工具,常与广义线性模型结合使用,它可以同时选择变量并估计参数。当存在许多感兴趣的变量时,如在当前的生物学和生物医学研究中,LASSO的功效可能会受到限制。幸运的是,已经收集了大量的生物学和生物医学数据,这些数据可能包含有关某些变量重要性的有用信息。本文提出了LASSO的一种扩展,即先验LASSO(pLASSO),将该先验信息纳入惩罚广义线性模型。通过在LASSO准则函数中添加先验信息与模型之间差异的额外度量来实现这一目标。对于线性回归,可以使用类似于最小角回归(LARS)的过程找到pLASSO估计器的整个解路径。渐近理论和模拟结果表明,当先验信息相对准确时,pLASSO比LASSO有显著改进。当先验信息不太可靠时,pLASSO对错误设定表现出很强的稳健性。我们使用来自全基因组关联研究的真实数据集说明了pLASSO的应用。