通过CV熵滤波器进行统一的无模型相互作用筛选。

Unified model-free interaction screening via CV-entropy filter.

作者信息

Xiong Wei, Chen Yaxian, Ma Shuangge

机构信息

School of Statistics, University of International Business and Economics, Beijing 100872, PR China.

Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong.

出版信息

Comput Stat Data Anal. 2023 Apr;180. doi: 10.1016/j.csda.2022.107684. Epub 2022 Dec 28.

DOI:10.1016/j.csda.2022.107684

PMID:36910335

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9997997/

Abstract

For many practical high-dimensional problems, interactions have been increasingly found to play important roles beyond main effects. A representative example is gene-gene interaction. Joint analysis, which analyzes all interactions and main effects in a single model, can be seriously challenged by high dimensionality. For high-dimensional data analysis in general, marginal screening has been established as effective for reducing computational cost, increasing stability, and improving estimation/selection performance. Most of the existing marginal screening methods are designed for the analysis of main effects only. The existing screening methods for interaction analysis are often limited by making stringent model assumptions, lacking robustness, and/or requiring predictors to be continuous (and hence lacking flexibility). A unified marginal screening approach tailored to interaction analysis is developed, which can be applied to regression, classification, and survival analysis. Predictors are allowed to be continuous and discrete. The proposed approach is built on Coefficient of Variation (CV) filters based on information entropy. Statistical properties are rigorously established. It is shown that the CV filters are almost insensitive to the distribution tails of predictors, correlation structure among predictors, and sparsity level of signals. An efficient two-stage algorithm is developed to make the proposed approach scalable to ultrahigh-dimensional data. Simulations and the analysis of TCGA LUAD data further establish the practical superiority of the proposed approach.

摘要

对于许多实际的高维问题，人们越来越发现交互作用在主效应之外起着重要作用。一个典型的例子是基因-基因相互作用。联合分析在单个模型中分析所有交互作用和主效应，可能会受到高维性的严重挑战。一般来说，对于高维数据分析，边际筛选已被证明是有效的，它可以降低计算成本、提高稳定性并改善估计/选择性能。现有的大多数边际筛选方法仅设计用于主效应分析。现有的交互作用分析筛选方法通常受到严格模型假设的限制，缺乏稳健性，和/或要求预测变量是连续的（因此缺乏灵活性）。本文开发了一种专门针对交互作用分析的统一边际筛选方法，该方法可应用于回归、分类和生存分析。预测变量可以是连续的和离散的。所提出的方法基于基于信息熵的变异系数（CV）滤波器构建。严格建立了统计性质。结果表明，CV滤波器对预测变量的分布尾部、预测变量之间的相关结构和信号的稀疏水平几乎不敏感。开发了一种高效的两阶段算法，使所提出的方法能够扩展到超高维数据。模拟和对TCGA LUAD数据的分析进一步确立了所提出方法的实际优势。

相似文献

Unified model-free interaction screening via CV-entropy filter.通过CV熵滤波器进行统一的无模型相互作用筛选。

Comput Stat Data Anal. 2023 Apr;180. doi: 10.1016/j.csda.2022.107684. Epub 2022 Dec 28.

Identification of Gene-Environment Interactions by Non-Parametric Kendall's Partial Correlation with Application to TCGA Ultrahigh-Dimensional Survival Genomic Data.非参数 Kendall 部分相关系数在 TCGA 超高维生存基因组数据中的基因-环境交互作用识别。

Front Biosci (Landmark Ed). 2022 Jul 18;27(8):225. doi: 10.31083/j.fbl2708225.

Feature Screening in Ultrahigh Dimensional Generalized Varying-coefficient Models.超高维广义变系数模型中的特征筛选

Stat Sin. 2020;30:1049-1067. doi: 10.5705/ss.202017.0362.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Model-Free Conditional Independence Feature Screening For Ultrahigh Dimensional Data.超高维数据的无模型条件独立特征筛选

Sci China Math. 2017 Mar;60(3):551-568. doi: 10.1007/s11425-016-0186-8. Epub 2016 Dec 29.

Overlapping group screening for detection of gene-environment interactions with application to TCGA high-dimensional survival genomic data.重叠群组筛选法检测基因-环境相互作用及其在 TCGA 高维生存基因组数据中的应用。

BMC Bioinformatics. 2022 May 30;23(1):202. doi: 10.1186/s12859-022-04750-7.

The Kendall interaction filter for variable interaction screening in high dimensional classification problems.用于高维分类问题中变量交互筛选的肯德尔交互过滤器。

J Appl Stat. 2022 Feb 4;50(7):1496-1514. doi: 10.1080/02664763.2022.2031125. eCollection 2023.

A selective overview of feature screening for ultrahigh-dimensional data.超高维数据特征筛选的选择性概述。

Sci China Math. 2015 Oct;58(10):2033-2054. doi: 10.1007/s11425-015-5062-9. Epub 2015 Aug 22.

A screening-testing approach for detecting gene-environment interactions using sequential penalized and unpenalized multiple logistic regression.一种使用序贯惩罚和非惩罚多元逻辑回归检测基因-环境相互作用的筛查-检测方法。

Pac Symp Biocomput. 2015:183-94.

Feature screening in ultrahigh-dimensional varying-coefficient Cox model.超高维变系数Cox模型中的特征筛选

J Multivar Anal. 2019 May;171:284-297. doi: 10.1016/j.jmva.2018.12.009. Epub 2018 Dec 28.

本文引用的文献

The Kendall interaction filter for variable interaction screening in high dimensional classification problems.用于高维分类问题中变量交互筛选的肯德尔交互过滤器。

J Appl Stat. 2022 Feb 4;50(7):1496-1514. doi: 10.1080/02664763.2022.2031125. eCollection 2023.

Robust Variable and Interaction Selection for Logistic Regression and General Index Models.逻辑回归和一般指数模型的稳健变量与交互作用选择

J Am Stat Assoc. 2019;114(525):271-286. doi: 10.1080/01621459.2017.1401541. Epub 2018 Jun 28.

Interaction screening by Kendall's partial correlation for ultrahigh-dimensional data with survival trait.超高维数据与生存特征的 Kendall 部分相关交互筛选。

Bioinformatics. 2020 May 1;36(9):2763-2769. doi: 10.1093/bioinformatics/btaa017.

Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach.使用分位数偏相关方法稳健识别预后的基因-环境相互作用。

Genomics. 2019 Sep;111(5):1115-1123. doi: 10.1016/j.ygeno.2018.07.006. Epub 2018 Jul 17.

GeneGini: Assessment via the Gini Coefficient of Reference "Housekeeping" Genes and Diverse Human Transporter Expression Profiles.GeneGini：基于基尼系数对参考“管家”基因和多样化的人类转运蛋白表达谱的评估。

Cell Syst. 2018 Feb 28;6(2):230-244.e1. doi: 10.1016/j.cels.2018.01.003. Epub 2018 Feb 7.

Identifying gene-gene interactions using penalized tensor regression.使用惩罚张量回归识别基因-基因相互作用。

Stat Med. 2018 Feb 20;37(4):598-610. doi: 10.1002/sim.7523. Epub 2017 Oct 16.

Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data.网络方法在复杂疾病系统生物学分析中的应用：多组学数据的综合分析方法。

Brief Bioinform. 2018 Nov 27;19(6):1370-1381. doi: 10.1093/bib/bbx066.

Part mutual information for quantifying direct associations in networks.用于量化网络中直接关联的部分互信息。

Proc Natl Acad Sci U S A. 2016 May 3;113(18):5130-5. doi: 10.1073/pnas.1522586113. Epub 2016 Apr 18.

Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis.超高维判别分析的无模型特征筛选

J Am Stat Assoc. 2015 Jun 1;110(510):630-641. doi: 10.1080/01621459.2014.920256.

A LASSO FOR HIERARCHICAL INTERACTIONS.用于分层交互的套索法

Ann Stat. 2013 Jun;41(3):1111-1141. doi: 10.1214/13-AOS1096.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验