Kang Jian, Hong Hyokyoung G, Li Y I
Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, Michigan 48109, U.S.A.
Department of Statistics and Probability, Michigan State University, 619 Red Cedar Rd, East Lansing, Michigan 48823, U.S.A.
Biometrika. 2017 Nov;104(4):785-800. doi: 10.1093/biomet/asx052. Epub 2017 Oct 9.
Traditional variable selection methods are compromised by overlooking useful information on covariates with similar functionality or spatial proximity, and by treating each covariate independently. Leveraging prior grouping information on covariates, we propose partition-based screening methods for ultrahigh-dimensional variables in the framework of generalized linear models. We show that partition-based screening exhibits the sure screening property with a vanishing false selection rate, and we propose a data-driven partition screening framework with unavailable or unreliable prior knowledge on covariate grouping and investigate its theoretical properties. We consider two special cases: correlation-guided partitioning and spatial location- guided partitioning. In the absence of a single partition, we propose a theoretically justified strategy for combining statistics from various partitioning methods. The utility of the proposed methods is demonstrated via simulation and analysis of functional neuroimaging data.
传统的变量选择方法存在缺陷,因为它忽略了具有相似功能或空间邻近性的协变量的有用信息,并且独立地处理每个协变量。利用协变量的先验分组信息,我们在广义线性模型框架下提出了基于划分的超高维变量筛选方法。我们表明,基于划分的筛选具有确定筛选性质且错误选择率趋于零,并且我们提出了一个在协变量分组的先验知识不可用或不可靠时的数据驱动划分筛选框架,并研究了其理论性质。我们考虑两种特殊情况:相关引导划分和空间位置引导划分。在没有单一划分的情况下,我们提出了一种理论上合理的策略来组合来自各种划分方法的统计量。通过对功能性神经成像数据的模拟和分析,证明了所提出方法的实用性。