基于网络的惩罚回归及其在基因组数据中的应用。

Network-based penalized regression with application to genomic data.

作者信息

Kim Sunkyung, Pan Wei, Shen Xiaotong

机构信息

Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55405, U.S.A.

出版信息

Biometrics. 2013 Sep;69(3):582-93. doi: 10.1111/biom.12035. Epub 2013 Jul 3.

DOI:10.1111/biom.12035

PMID:23822182

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4007772/

Abstract

Penalized regression approaches are attractive in dealing with high-dimensional data such as arising in high-throughput genomic studies. New methods have been introduced to utilize the network structure of predictors, for example, gene networks, to improve parameter estimation and variable selection. All the existing network-based penalized methods are based on an assumption that parameters, for example, regression coefficients, of neighboring nodes in a network are close in magnitude, which however may not hold. Here we propose a novel penalized regression method based on a weaker prior assumption that the parameters of neighboring nodes in a network are likely to be zero (or non-zero) at the same time, regardless of their specific magnitudes. We propose a novel non-convex penalty function to incorporate this prior, and an algorithm based on difference convex programming. We use simulated data and two breast cancer gene expression datasets to demonstrate the advantages of the proposed methods over some existing methods. Our proposed methods can be applied to more general problems for group variable selection.

摘要

惩罚回归方法在处理高维数据（如高通量基因组研究中出现的数据）方面具有吸引力。已经引入了新的方法来利用预测变量的网络结构，例如基因网络，以改进参数估计和变量选择。所有现有的基于网络的惩罚方法都基于这样一个假设，即网络中相邻节点的参数（例如回归系数）在大小上相近，但这一假设可能并不成立。在此，我们基于一个较弱的先验假设提出了一种新颖的惩罚回归方法，即网络中相邻节点的参数可能同时为零（或非零），而不管其具体大小如何。我们提出了一种新颖的非凸惩罚函数来纳入这一先验，并提出了一种基于差分凸规划的算法。我们使用模拟数据和两个乳腺癌基因表达数据集来证明所提出的方法相对于一些现有方法的优势。我们提出的方法可应用于更一般的组变量选择问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec13/4007772/98f7669cc73a/nihms450179f1.jpg

相似文献

Network-based penalized regression with application to genomic data.基于网络的惩罚回归及其在基因组数据中的应用。

Biometrics. 2013 Sep;69(3):582-93. doi: 10.1111/biom.12035. Epub 2013 Jul 3.

Efficient ℓ -norm feature selection based on augmented and penalized minimization.基于增广和惩罚最小化的高效 ℓ -范数特征选择。

Stat Med. 2018 Feb 10;37(3):473-486. doi: 10.1002/sim.7526. Epub 2017 Oct 30.

Variable selection in penalized model-based clustering via regularization on grouped parameters.基于分组参数正则化的惩罚模型聚类中的变量选择

Biometrics. 2008 Sep;64(3):921-930. doi: 10.1111/j.1541-0420.2007.00955.x. Epub 2007 Dec 20.

A permutation approach for selecting the penalty parameter in penalized model selection.一种在惩罚模型选择中用于选择惩罚参数的排列方法。

Biometrics. 2015 Dec;71(4):1185-94. doi: 10.1111/biom.12359. Epub 2015 Aug 3.

A Two-Step Penalized Regression Method with Networked Predictors.一种带有网络预测变量的两步惩罚回归方法。

Stat Biosci. 2012 May 1;4(1):27-46. doi: 10.1007/s12561-011-9051-4.

Sign-based Shrinkage Based on an Asymmetric LASSO Penalty.基于非对称套索罚则的基于符号的收缩法。

J Data Sci. 2021;19(3):429-449. doi: 10.6339/21-JDS1015. Epub 2021 Jun 2.

Regression-Based Network Estimation for High-Dimensional Genetic Data.基于回归的高维遗传数据网络估计

J Comput Biol. 2019 Apr;26(4):336-349. doi: 10.1089/cmb.2018.0225. Epub 2019 Jan 17.

Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms.将预测变量的先验知识纳入具有多个惩罚项的惩罚分类器中。

Bioinformatics. 2007 Jul 15;23(14):1775-82. doi: 10.1093/bioinformatics/btm234. Epub 2007 May 5.

Penalized regression approaches to testing for quantitative trait-rare variant association.惩罚回归方法在检测数量性状-稀有变异关联中的应用。

Front Genet. 2014 May 13;5:121. doi: 10.3389/fgene.2014.00121. eCollection 2014.

Marginal false discovery rate control for likelihood-based penalized regression models.基于似然的惩罚回归模型的边际错误发现率控制

Biom J. 2019 Jul;61(4):889-901. doi: 10.1002/bimj.201800138. Epub 2019 Feb 11.

引用本文的文献

NetREm: Network Regression Embeddings reveal cell-type transcription factor coordination for gene regulation.NetREm：网络回归嵌入揭示细胞类型转录因子在基因调控中的协同作用。

Bioinform Adv. 2024 Dec 20;5(1):vbae206. doi: 10.1093/bioadv/vbae206. eCollection 2025.

A Selective Review of Network Analysis Methods for Gene Expression Data.基因表达数据网络分析方法的选择性综述。

Methods Mol Biol. 2025;2880:293-307. doi: 10.1007/978-1-0716-4276-4_14.

Bi-level structured functional analysis for genome-wide association studies.基于双层结构的全基因组关联研究功能分析。

Biometrics. 2023 Dec;79(4):3359-3373. doi: 10.1111/biom.13871. Epub 2023 May 7.

Prediction models with graph kernel regularization for network data.用于网络数据的带有图核正则化的预测模型。

J Appl Stat. 2022 Jan 31;50(6):1400-1417. doi: 10.1080/02664763.2022.2028745. eCollection 2023.

Multi-Modal Imaging Genetics Data Fusion via a Hypergraph-Based Manifold Regularization: Application to Schizophrenia Study.基于超图的流形正则化的多模态影像遗传学数据融合：在精神分裂症研究中的应用。

IEEE Trans Med Imaging. 2022 Sep;41(9):2263-2272. doi: 10.1109/TMI.2022.3161828. Epub 2022 Aug 31.

Knowledge-Guided Statistical Learning Methods for Analysis of High-Dimensional -Omics Data in Precision Oncology.用于精准肿瘤学中高维组学数据分析的知识引导统计学习方法

JCO Precis Oncol. 2019 Oct 24;3. doi: 10.1200/PO.19.00018. eCollection 2019 Oct.

Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior.基于阈值化图拉普拉斯高斯先验的贝叶斯网络标记选择

Bayesian Anal. 2020 Mar;15(1):79-102. doi: 10.1214/18-ba1142. Epub 2019 Jan 5.

Bayesian Non-linear Support Vector Machine for High-Dimensional Data with Incorporation of Graph Information on Features.结合特征图信息的高维数据贝叶斯非线性支持向量机

Proc IEEE Int Conf Big Data. 2019 Dec;2019:4874-4882. doi: 10.1109/bigdata47090.2019.9006473. Epub 2020 Feb 24.

Bayesian integrative analysis of epigenomic and transcriptomic data identifies Alzheimer's disease candidate genes and networks.贝叶斯综合分析表观基因组和转录组数据，确定阿尔茨海默病候选基因和网络。

PLoS Comput Biol. 2020 Apr 7;16(4):e1007771. doi: 10.1371/journal.pcbi.1007771. eCollection 2020 Apr.

forgeNet: a graph deep neural network model using tree-based ensemble classifiers for feature graph construction.forgeNet：一种基于图的深度神经网络模型，使用基于树的集成分类器进行特征图构建。

Bioinformatics. 2020 Jun 1;36(11):3507-3515. doi: 10.1093/bioinformatics/btaa164.

本文引用的文献

Network Based Prediction Model for Genomics Data Analysis.用于基因组数据分析的基于网络的预测模型。

Stat Biosci. 2012 Apr 1;4(1). doi: 10.1007/s12561-012-9056-7.

A Two-Step Penalized Regression Method with Networked Predictors.一种带有网络预测变量的两步惩罚回归方法。

Stat Biosci. 2012 May 1;4(1):27-46. doi: 10.1007/s12561-011-9051-4.

VARIABLE SELECTION AND REGRESSION ANALYSIS FOR GRAPH-STRUCTURED COVARIATES WITH AN APPLICATION TO GENOMICS.具有基因组学应用的图结构协变量的变量选择与回归分析

Ann Appl Stat. 2010 Sep 1;4(3):1498-1516. doi: 10.1214/10-AOAS332.

Likelihood-based selection and sharp parameter estimation.基于似然性的选择与精确参数估计。

J Am Stat Assoc. 2012 Jan 1;107(497):223-232. doi: 10.1080/01621459.2011.645783. Epub 2012 Jun 11.

STRUCTURED, SPARSE REGRESSION WITH APPLICATION TO HIV DRUG RESISTANCE.结构化稀疏回归及其在HIV耐药性中的应用

Ann Appl Stat. 2011 Jun 1;5(2A):628-644. doi: 10.1214/10-AOAS428.

Grouping pursuit through a regularization solution surface.通过正则化解曲面进行分组追踪。

J Am Stat Assoc. 2010 Jun 1;105(490):727-739. doi: 10.1198/jasa.2010.tm09380.

Support Vector Machines with Disease-gene-centric Network Penalty for High Dimensional Microarray Data.用于高维微阵列数据的以疾病基因为中心的网络惩罚支持向量机

Stat Interface. 2009 Summer;2(3):257-269. doi: 10.4310/sii.2009.v2.n3.a1.

Incorporating predictor network in penalized regression with application to microarray data.将预测网络纳入惩罚回归并应用于微阵列数据。

Biometrics. 2010 Jun;66(2):474-84. doi: 10.1111/j.1541-0420.2009.01296.x. Epub 2009 Jul 23.

Network-based multiple locus linkage analysis of expression traits.基于网络的表达性状多位点连锁分析。

Bioinformatics. 2009 Jun 1;25(11):1390-6. doi: 10.1093/bioinformatics/btp177. Epub 2009 Mar 31.

Network-constrained regularization and variable selection for analysis of genomic data.用于基因组数据分析的网络约束正则化和变量选择

Bioinformatics. 2008 May 1;24(9):1175-82. doi: 10.1093/bioinformatics/btn081. Epub 2008 Mar 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验