Taylor Jonathan, Tibshirani Robert J
Department of Statistics, Stanford University, Stanford, CA 94305;
Department of Health Research & Policy and Department of Statistics, Stanford University, Stanford, CA 94305
Proc Natl Acad Sci U S A. 2015 Jun 23;112(25):7629-34. doi: 10.1073/pnas.1507583112.
We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
我们描述了“选择性推断”问题。这解决了以下挑战:在挖掘一组数据以寻找潜在关联后,我们如何恰当地评估这些关联的强度?我们进行了“挑选”——寻找最强关联——这一事实意味着,对于我们所发现的关联,我们必须设定更高的标准来判定其具有显著性。在大数据和复杂统计建模的时代,这一挑战变得更加重要。樱桃树(数据集)可能非常大,而用于挑选的工具(统计学习方法)如今也非常复杂。我们描述了选择性推断方面的一些最新进展,并举例说明了它们在向前逐步回归、套索法和主成分分析中的应用。
Proc Natl Acad Sci U S A. 2015-6-23
Biostatistics. 2016-4
BMC Med Res Methodol. 2022-7-26
J Forensic Leg Med. 2018-7
Comput Biol Chem. 2013-1-12
PLoS One. 2015-6-12
Scand Stat Theory Appl. 2025-6
J Am Stat Assoc. 2025
BMC Med Inform Decis Mak. 2025-2-18
J Gerontol A Biol Sci Med Sci. 2024-12-11
Ann Stat. 2014-4
Ann Stat. 2009-1-1
PLoS Med. 2005-8
Proc Natl Acad Sci U S A. 2003-8-5
Nucleic Acids Res. 2003-1-1