Li Jialu, Yu Guan, Li Qizhai, Liu Yufeng
School of Mathematics and Statistics, Beijing Institute of Technology.
Department of Biostatistics, State University of New York at Buffalo.
J Comput Graph Stat. 2023;32(1):263-274. doi: 10.1080/10618600.2022.2070172. Epub 2022 May 26.
Modern high-dimensional statistical inference often faces the problem of missing data. In recent decades, many studies have focused on this topic and provided strategies including complete-sample analysis and imputation procedures. However, complete-sample analysis discards information of incomplete samples, while imputation procedures have accumulative errors from each single imputation. In this paper, we propose a new method, Sample-wise COmbined missing effect Model with penalization (SCOM), to deal with missing data occurring in predictors. Instead of imputing the predictors, SCOM estimates the combined effect caused by all missing data for each incomplete sample. SCOM makes full use of all available data. It is robust with respect to various missing mechanisms. Theoretical studies show the oracle inequality for the proposed estimator, and the consistency of variable selection and combined missing effect selection. Simulation studies and an application to the Residential Building Data also illustrate the effectiveness of the proposed SCOM.
现代高维统计推断常常面临数据缺失问题。近几十年来,许多研究都聚焦于该主题,并提供了包括完全样本分析和插补程序在内的策略。然而,完全样本分析会丢弃不完全样本的信息,而插补程序每次单独插补都会产生累积误差。在本文中,我们提出了一种新方法——带惩罚的样本明智组合缺失效应模型(SCOM),以处理预测变量中出现的数据缺失问题。SCOM不是对预测变量进行插补,而是估计每个不完全样本中所有缺失数据所造成的组合效应。SCOM充分利用了所有可用数据。它对于各种缺失机制都具有稳健性。理论研究表明了所提出估计量的神谕不等式,以及变量选择和组合缺失效应选择的一致性。模拟研究以及在住宅建筑数据中的应用也说明了所提出的SCOM的有效性。