Fan Yingying, Demirkaya Emre, Lv Jinchi
Data Sciences and Operations Department, University of Southern California, Los Angeles, CA 90089, USA.
Business Analytics & Statistics, The University of Tennessee, Knoxville, Knoxville, TN 37996-4140, USA.
J Mach Learn Res. 2019;20.
Evaluating the joint significance of covariates is of fundamental importance in a wide range of applications. To this end, p-values are frequently employed and produced by algorithms that are powered by classical large-sample asymptotic theory. It is well known that the conventional p-values in Gaussian linear model are valid even when the dimensionality is a non-vanishing fraction of the sample size, but can break down when the design matrix becomes singular in higher dimensions or when the error distribution deviates from Gaussianity. A natural question is when the conventional p-values in generalized linear models become invalid in diverging dimensions. We establish that such a breakdown can occur early in nonlinear models. Our theoretical characterizations are confirmed by simulation studies.
评估协变量的联合显著性在广泛的应用中至关重要。为此,p值经常被使用,并由基于经典大样本渐近理论的算法生成。众所周知,高斯线性模型中的传统p值即使在维度是样本量的非零比例时也是有效的,但当设计矩阵在高维中变得奇异或误差分布偏离高斯性时可能会失效。一个自然的问题是广义线性模型中的传统p值在维度发散时何时变得无效。我们证明这种失效可能在非线性模型中很早就会出现。我们的理论特征通过模拟研究得到了证实。