IPAD：基于仿冒品推断的稳定可解释预测

IPAD: Stable Interpretable Forecasting with Knockoffs Inference.

作者信息

Fan Yingying, Lv Jinchi, Sharifvaghefi Mahrad, Uematsu Yoshimasa

机构信息

University of Southern California.

Tohoku University.

出版信息

J Am Stat Assoc. 2020;115(532):1822-1834. doi: 10.1080/01621459.2019.1654878. Epub 2019 Sep 17.

DOI:10.1080/01621459.2019.1654878

PMID:33716359

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7954402/

Abstract

Interpretability and stability are two important features that are desired in many contemporary big data applications arising in statistics, economics, and finance. While the former is enjoyed to some extent by many existing forecasting approaches, the latter in the sense of controlling the fraction of wrongly discovered features which can enhance greatly the interpretability is still largely underdeveloped. To this end, in this paper we exploit the general framework of model-X knockoffs introduced recently in Candès, Fan, Janson and Lv (2018), which is nonconventional for reproducible large-scale inference in that the framework is completely free of the use of p-values for significance testing, and suggest a new method of intertwined probabilistic factors decoupling (IPAD) for stable interpretable forecasting with knockoffs inference in high-dimensional models. The recipe of the method is constructing the knockoff variables by assuming a latent factor model that is exploited widely in economics and finance for the association structure of covariates. Our method and work are distinct from the existing literature in that we estimate the covariate distribution from data instead of assuming that it is known when constructing the knockoff variables, our procedure does not require any sample splitting, we provide theoretical justifications on the asymptotic false discovery rate control, and the theory for the power analysis is also established. Several simulation examples and the real data analysis further demonstrate that the newly suggested method has appealing finite-sample performance with desired interpretability and stability compared to some popularly used forecasting methods.

摘要

可解释性和稳定性是统计、经济和金融领域中许多当代大数据应用所期望的两个重要特征。虽然许多现有预测方法在一定程度上具备前者，但在控制错误发现特征的比例以极大提高可解释性这方面，后者仍在很大程度上未得到充分发展。为此，在本文中，我们利用了Candès、Fan、Janson和Lv（2018）最近引入的模型X仿样的通用框架，该框架在可重复大规模推断方面是非传统的，因为它完全不使用p值进行显著性检验，并提出了一种新的交织概率因子解耦（IPAD）方法，用于在高维模型中通过仿样推断进行稳定的可解释预测。该方法的诀窍是通过假设一个在经济和金融中广泛用于协变量关联结构的潜在因子模型来构建仿样变量。我们的方法和工作与现有文献不同之处在于，我们从数据中估计协变量分布，而不是在构建仿样变量时假设其已知，我们的过程不需要任何样本分割，我们提供了关于渐近错误发现率控制的理论依据，并且还建立了功效分析理论。几个模拟示例和实际数据分析进一步表明，与一些常用的预测方法相比，新提出的方法具有吸引人的有限样本性能，具备所需的可解释性和稳定性。

相似文献

IPAD: Stable Interpretable Forecasting with Knockoffs Inference.IPAD：基于仿冒品推断的稳定可解释预测

J Am Stat Assoc. 2020;115(532):1822-1834. doi: 10.1080/01621459.2019.1654878. Epub 2019 Sep 17.

RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs.RANK：基于图形非线性仿样的大规模推断

J Am Stat Assoc. 2020;115(529):362-379. doi: 10.1080/01621459.2018.1546589. Epub 2019 Apr 11.

Kernel Knockoffs Selection for Nonparametric Additive Models.非参数加法模型的核仿冒品选择

J Am Stat Assoc. 2023;118(543):2158-2170. doi: 10.1080/01621459.2022.2039671. Epub 2022 Mar 14.

Knockoff boosted tree for model-free variable selection.无模型变量选择的仿射提升树。

Bioinformatics. 2021 May 17;37(7):976-983. doi: 10.1093/bioinformatics/btaa770.

DeepLINK: Deep learning inference using knockoffs with applications to genomics.DeepLINK：使用 Knockoffs 进行深度学习推断及其在基因组学中的应用。

Proc Natl Acad Sci U S A. 2021 Sep 7;118(36). doi: 10.1073/pnas.2104683118.

Deep direct likelihood knockoffs.深度直接似然性仿样

Adv Neural Inf Process Syst. 2020 Dec;33:5036-5046.

Gene hunting with hidden Markov model knockoffs.使用隐马尔可夫模型仿样进行基因搜寻。

Biometrika. 2019 Mar;106(1):1-18. doi: 10.1093/biomet/asy033. Epub 2018 Aug 4.

Sparse regression and marginal testing using cluster prototypes.使用聚类原型的稀疏回归和边际检验。

Biostatistics. 2016 Apr;17(2):364-76. doi: 10.1093/biostatistics/kxv049. Epub 2015 Nov 27.

Competition-based control of the false discovery proportion.基于竞争的假发现率控制。

Biometrics. 2023 Dec;79(4):3472-3484. doi: 10.1111/biom.13830. Epub 2023 Jan 30.

False discovery rate-controlled multiple testing for union null hypotheses: a knockoff-based approach.基于置换检验的联合零假设的错误发现率控制多重检验方法。

Biometrics. 2023 Dec;79(4):3497-3509. doi: 10.1111/biom.13848. Epub 2023 Mar 15.

引用本文的文献

Uncovering Heterogeneous Effects via Localized Feature Selection.通过局部特征选择揭示异质效应

bioRxiv. 2025 Jun 7:2025.06.03.657761. doi: 10.1101/2025.06.03.657761.

Longitudinal Microbiome-based Interpretable Machine Learning for Identification of Time-Varying Biomarkers in Early Prediction of Disease Outcomes.基于纵向微生物组的可解释机器学习用于在疾病结局早期预测中识别随时间变化的生物标志物。

bioRxiv. 2024 Nov 20:2024.10.18.619118. doi: 10.1101/2024.10.18.619118.

Summary statistics knockoffs inference with family-wise error rate control.基于 FWER 控制的摘要统计量置换检验推断。

Biometrics. 2024 Jul 1;80(3). doi: 10.1093/biomtc/ujae082.

Searching for robust associations with a multi-environment knockoff filter.使用多环境仿冒筛选器寻找稳健关联。

Biometrika. 2022 Sep;109(3):611-629. doi: 10.1093/biomet/asab055. Epub 2021 Nov 2.

DIET: Conditional independence testing with marginal dependence measures of residual information.饮食：基于残余信息边际依赖度量的条件独立性检验

Proc Mach Learn Res. 2023 Apr;206:10343-10367.

Statistical inference for linear mediation models with high-dimensional mediators and application to studying stock reaction to COVID-19 pandemic.具有高维中介变量的线性中介模型的统计推断及其在研究股票对新冠疫情反应中的应用

J Econom. 2023 Jul;235(1):166-179. doi: 10.1016/j.jeconom.2022.03.001. Epub 2022 Apr 8.

Asymptotic Theory of Eigenvectors for Random Matrices with Diverging Spikes.具有发散尖峰的随机矩阵特征向量的渐近理论

J Am Stat Assoc. 2022;117(538):996-1009. doi: 10.1080/01621459.2020.1840990. Epub 2020 Dec 8.

Null-free False Discovery Rate Control Using Decoy Permutations.使用诱饵排列的无空值错误发现率控制

Acta Math Appl Sin. 2022;38(2):235-253. doi: 10.1007/s10255-022-1077-5. Epub 2022 Apr 9.

DeepLINK: Deep learning inference using knockoffs with applications to genomics.DeepLINK：使用 Knockoffs 进行深度学习推断及其在基因组学中的应用。

Proc Natl Acad Sci U S A. 2021 Sep 7;118(36). doi: 10.1073/pnas.2104683118.

Utilizing machine learning with knockoff filtering to extract significant metabolites in Crohn's disease with a publicly available untargeted metabolomics dataset.利用机器学习和 knockoff 过滤技术，从公开的非靶向代谢组学数据集提取克罗恩病的显著代谢物。

PLoS One. 2021 Jul 29;16(7):e0255240. doi: 10.1371/journal.pone.0255240. eCollection 2021.

本文引用的文献

RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs.RANK：基于图形非线性仿样的大规模推断

J Am Stat Assoc. 2020;115(529):362-379. doi: 10.1080/01621459.2018.1546589. Epub 2019 Apr 11.

Nonuniformity of P-values Can Occur Early in Diverging Dimensions.P值的不均匀性可能在维度发散的早期出现。

J Mach Learn Res. 2019;20.

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence.任意协方差依赖下错误发现比例的估计

J Am Stat Assoc. 2012;107(499):1019-1035. doi: 10.1080/01621459.2012.720478.

Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.《超高维特征空间中的确定独立性筛选》讨论

J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.

High Dimensional Classification Using Features Annealed Independence Rules.使用特征退火独立规则的高维分类

Ann Stat. 2008;36(6):2605-2637. doi: 10.1214/07-AOS504.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验