重叠组逻辑回归及其在遗传通路选择中的应用

Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection.

作者信息

Zeng Yaohui, Breheny Patrick

机构信息

Department of Biostatistics, University of Iowa, Iowa City, IA, USA.

出版信息

Cancer Inform. 2016 Sep 15;15:179-87. doi: 10.4137/CIN.S40043. eCollection 2016.

DOI:10.4137/CIN.S40043

PMID:27679461

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5026200/

Abstract

Discovering important genes that account for the phenotype of interest has long been a challenge in genome-wide expression analysis. Analyses such as gene set enrichment analysis (GSEA) that incorporate pathway information have become widespread in hypothesis testing, but pathway-based approaches have been largely absent from regression methods due to the challenges of dealing with overlapping pathways and the resulting lack of available software. The R package grpreg is widely used to fit group lasso and other group-penalized regression models; in this study, we develop an extension, grpregOverlap, to allow for overlapping group structure using a latent variable approach. We compare this approach to the ordinary lasso and to GSEA using both simulated and real data. We find that incorporation of prior pathway information can substantially improve the accuracy of gene expression classifiers, and we shed light on several ways in which hypothesis-testing approaches such as GSEA differ from regression approaches with respect to the analysis of pathway data.

摘要

长期以来，在全基因组表达分析中，发现导致感兴趣表型的重要基因一直是一项挑战。诸如基因集富集分析（GSEA）等纳入通路信息的分析方法在假设检验中已广泛应用，但由于处理重叠通路存在挑战且缺乏可用软件，基于通路的方法在回归方法中基本未被采用。R包grpreg被广泛用于拟合组套索和其他组惩罚回归模型；在本研究中，我们开发了一个扩展包grpregOverlap，通过潜在变量方法允许使用重叠组结构。我们使用模拟数据和真实数据将此方法与普通套索和GSEA进行比较。我们发现纳入先验通路信息可显著提高基因表达分类器的准确性，并且我们阐明了在通路数据分析方面，诸如GSEA等假设检验方法与回归方法不同的几种方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bba2/5026200/d0b175d94b03/cin-15-2016-179f1.jpg

相似文献

Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection.重叠组逻辑回归及其在遗传通路选择中的应用

Cancer Inform. 2016 Sep 15;15:179-87. doi: 10.4137/CIN.S40043. eCollection 2016.

Genome-wide association analysis by lasso penalized logistic regression.基于套索惩罚逻辑回归的全基因组关联分析。

Bioinformatics. 2009 Mar 15;25(6):714-21. doi: 10.1093/bioinformatics/btp041. Epub 2009 Jan 28.

FUNNEL-GSEA: FUNctioNal ELastic-net regression in time-course gene set enrichment analysis.FUNNEL-GSEA：时间序列基因集富集分析中的功能弹性网络回归。

Bioinformatics. 2017 Jul 1;33(13):1944-1952. doi: 10.1093/bioinformatics/btx104.

Investigating unique genes of five molecular subtypes of breast cancer using penalized logistic regression.利用惩罚逻辑回归研究五种乳腺癌分子亚型的独特基因。

J Cancer Res Ther. 2023 Apr;19(Supplement):S126-S137. doi: 10.4103/jcrt.jcrt_811_21.

High-dimensional Cox models: the choice of penalty as part of the model building process.高维Cox模型：作为模型构建过程一部分的惩罚项选择

Biom J. 2010 Feb;52(1):50-69. doi: 10.1002/bimj.200900064.

Overlapping group screening for detection of gene-gene interactions: application to gene expression profiles with survival trait.重叠群组筛选法检测基因-基因相互作用：在生存特征基因表达谱中的应用。

BMC Bioinformatics. 2018 Sep 21;19(1):335. doi: 10.1186/s12859-018-2372-2.

Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method.通过先验套索方法对广义线性模型进行带先验信息的变量选择

J Am Stat Assoc. 2016;111(513):355-376. doi: 10.1080/01621459.2015.1008363. Epub 2016 May 5.

Identification of clinically relevant features in hypertensive patients using penalized regression: a case study of cardiovascular events.使用惩罚回归识别高血压患者的临床相关特征：心血管事件的案例研究。

Med Biol Eng Comput. 2019 Sep;57(9):2011-2026. doi: 10.1007/s11517-019-02007-9. Epub 2019 Jul 25.

Optimism Bias Correction in Omics Studies with Big Data: Assessment of Penalized Methods on Simulated Data.基于大数据的组学研究中的乐观偏差校正：模拟数据上惩罚方法的评估。

OMICS. 2019 Apr;23(4):207-213. doi: 10.1089/omi.2018.0191. Epub 2019 Feb 22.

Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models.在高维惩罚 Cox 回归模型中考虑分组预测变量或途径。

BMC Bioinformatics. 2020 Jul 2;21(1):277. doi: 10.1186/s12859-020-03618-y.

引用本文的文献

Weighted overlapping group lasso for integrating prior network knowledge into gene set analysis.用于将先验网络知识整合到基因集分析中的加权重叠组套索法。

BMC Bioinformatics. 2025 Sep 1;26(1):226. doi: 10.1186/s12859-025-06170-9.

Multicategory Survival Outcomes Classification via Overlapping Group Screening Process Based on Multinomial Logistic Regression Model With Application to TCGA Transcriptomic Data.基于多项逻辑回归模型并应用于TCGA转录组数据的重叠组筛选过程的多类别生存结果分类

Cancer Inform. 2024 Oct 8;23:11769351241286710. doi: 10.1177/11769351241286710. eCollection 2024.

A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data.一种使用高维组学数据进行癌症生存预测的新型非负贝叶斯堆叠建模方法。

BMC Med Res Methodol. 2024 May 3;24(1):105. doi: 10.1186/s12874-024-02232-3.

A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data.一种用于高维组学数据的非负尖峰-板条套索广义线性堆叠预测建模方法。

BMC Bioinformatics. 2024 Mar 20;25(1):119. doi: 10.1186/s12859-024-05741-6.

Accounting for network noise in graph-guided Bayesian modeling of structured high-dimensional data.在基于图引导的贝叶斯建模对结构化高维数据进行建模时，考虑网络噪声的影响。

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae012.

Development of a proteomic signature associated with severe disease for patients with COVID-19 using data from 5 multicenter, randomized, controlled, and prospective studies.利用来自 5 项多中心、随机、对照、前瞻性研究的数据，开发与 COVID-19 重症患者相关的蛋白质组学特征。

Sci Rep. 2023 Nov 20;13(1):20315. doi: 10.1038/s41598-023-46343-1.

Overlapping group screening for detection of gene-environment interactions with application to TCGA high-dimensional survival genomic data.重叠群组筛选法检测基因-环境相互作用及其在 TCGA 高维生存基因组数据中的应用。

BMC Bioinformatics. 2022 May 30;23(1):202. doi: 10.1186/s12859-022-04750-7.

Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx.使用 ParProx 进行超高维组学数据的计算可扩展回归建模。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab256.

Multi-stage adaptive enrichment trial design with subgroup estimation.多阶段自适应富集试验设计与亚组估计。

J Biopharm Stat. 2020 Nov 1;30(6):1038-1049. doi: 10.1080/10543406.2020.1832109. Epub 2020 Oct 18.

Adaptive group-regularized logistic elastic net regression.自适应群组正则化逻辑弹性网回归。

Biostatistics. 2021 Oct 13;22(4):723-737. doi: 10.1093/biostatistics/kxz062.

本文引用的文献

Moment based gene set tests.基于矩的基因集检验。

BMC Bioinformatics. 2015 Apr 28;16:132. doi: 10.1186/s12859-015-0571-7.

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors.具有分组预测变量的非凸惩罚线性和逻辑回归模型的分组下降算法。

Stat Comput. 2015 Mar;25(2):173-187. doi: 10.1007/s11222-013-9424-2.

The limitations of simple gene set enrichment analysis assuming gene independence.假设基因独立性的简单基因集富集分析的局限性。

Stat Methods Med Res. 2016 Feb;25(1):472-87. doi: 10.1177/0962280212460441. Epub 2012 Oct 14.

ROAST: rotation gene set tests for complex microarray experiments.ROAST：用于复杂微阵列实验的旋转基因集检验。

Bioinformatics. 2010 Sep 1;26(17):2176-82. doi: 10.1093/bioinformatics/btq401. Epub 2010 Jul 7.

Gene set enrichment analysis using linear models and diagnostics.使用线性模型和诊断方法的基因集富集分析。

Bioinformatics. 2008 Nov 15;24(22):2586-91. doi: 10.1093/bioinformatics/btn465. Epub 2008 Sep 11.

Gene-set approach for expression pattern analysis.用于表达模式分析的基因集方法。

Brief Bioinform. 2008 May;9(3):189-97. doi: 10.1093/bib/bbn001. Epub 2008 Jan 17.

Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models.多维遗传通路数据的半参数回归：最小二乘核机器与线性混合模型

Biometrics. 2007 Dec;63(4):1079-88. doi: 10.1111/j.1541-0420.2007.00799.x.

Improving gene set analysis of microarray data by SAM-GS.通过SAM-GS改进微阵列数据的基因集分析

BMC Bioinformatics. 2007 Jul 5;8:242. doi: 10.1186/1471-2105-8-242.

Analyzing gene expression data in terms of gene sets: methodological issues.从基因集角度分析基因表达数据：方法学问题。

Bioinformatics. 2007 Apr 15;23(8):980-7. doi: 10.1093/bioinformatics/btm051. Epub 2007 Feb 15.

Nonparametric pathway-based regression models for analysis of genomic data.用于基因组数据分析的基于非参数通路的回归模型。

Biostatistics. 2007 Apr;8(2):265-84. doi: 10.1093/biostatistics/kxl007. Epub 2006 Jun 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

重叠组逻辑回归及其在遗传通路选择中的应用

Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献