Fan Yingying, Lv Jinchi, Sharifvaghefi Mahrad, Uematsu Yoshimasa
University of Southern California.
Tohoku University.
J Am Stat Assoc. 2020;115(532):1822-1834. doi: 10.1080/01621459.2019.1654878. Epub 2019 Sep 17.
Interpretability and stability are two important features that are desired in many contemporary big data applications arising in statistics, economics, and finance. While the former is enjoyed to some extent by many existing forecasting approaches, the latter in the sense of controlling the fraction of wrongly discovered features which can enhance greatly the interpretability is still largely underdeveloped. To this end, in this paper we exploit the general framework of model-X knockoffs introduced recently in Candès, Fan, Janson and Lv (2018), which is nonconventional for reproducible large-scale inference in that the framework is completely free of the use of p-values for significance testing, and suggest a new method of intertwined probabilistic factors decoupling (IPAD) for stable interpretable forecasting with knockoffs inference in high-dimensional models. The recipe of the method is constructing the knockoff variables by assuming a latent factor model that is exploited widely in economics and finance for the association structure of covariates. Our method and work are distinct from the existing literature in that we estimate the covariate distribution from data instead of assuming that it is known when constructing the knockoff variables, our procedure does not require any sample splitting, we provide theoretical justifications on the asymptotic false discovery rate control, and the theory for the power analysis is also established. Several simulation examples and the real data analysis further demonstrate that the newly suggested method has appealing finite-sample performance with desired interpretability and stability compared to some popularly used forecasting methods.
可解释性和稳定性是统计、经济和金融领域中许多当代大数据应用所期望的两个重要特征。虽然许多现有预测方法在一定程度上具备前者,但在控制错误发现特征的比例以极大提高可解释性这方面,后者仍在很大程度上未得到充分发展。为此,在本文中,我们利用了Candès、Fan、Janson和Lv(2018)最近引入的模型X仿样的通用框架,该框架在可重复大规模推断方面是非传统的,因为它完全不使用p值进行显著性检验,并提出了一种新的交织概率因子解耦(IPAD)方法,用于在高维模型中通过仿样推断进行稳定的可解释预测。该方法的诀窍是通过假设一个在经济和金融中广泛用于协变量关联结构的潜在因子模型来构建仿样变量。我们的方法和工作与现有文献不同之处在于,我们从数据中估计协变量分布,而不是在构建仿样变量时假设其已知,我们的过程不需要任何样本分割,我们提供了关于渐近错误发现率控制的理论依据,并且还建立了功效分析理论。几个模拟示例和实际数据分析进一步表明,与一些常用的预测方法相比,新提出的方法具有吸引人的有限样本性能,具备所需的可解释性和稳定性。