Fan Jianqing, Lv Jinchi
Frederick L. Moore '18 Professor of Finance, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA (
Stat Sin. 2010 Jan;20(1):101-148.
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods.
高维统计问题产生于科学研究和技术发展的各个领域。变量选择在当代统计学习和科学发现中起着关键作用。最佳子集选择方法的传统理念,可被视为惩罚似然的一种特殊形式,对于许多现代统计应用来说计算成本过高。在过去十年中,已经成功开发出其他形式的惩罚似然方法来应对高维问题。它们已被广泛应用于在高维统计推断中同时选择重要变量并估计其效应。在本文中,我们简要介绍高维变量选择在理论、方法和实现方面的最新进展。此类方法能够处理的维度限制是什么、惩罚函数的作用是什么以及统计性质是什么,这些问题迅速推动了该领域的发展。本文强调了非凹惩罚似然的性质及其在高维统计建模中的作用。我们还回顾了超高维变量选择方面的一些最新进展,重点是独立性筛选和双尺度方法。