glmgraph：一个用于结构化基因组数据变量选择和预测建模的R包。

glmgraph: an R package for variable selection and predictive modeling of structured genomic data.

作者信息

Chen Li, Liu Han, Kocher Jean-Pierre A, Li Hongzhe, Chen Jun

机构信息

Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905,USA, Department of Computer Science, Emory University, Atlanta, GA 30322,USA.

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA and.

出版信息

Bioinformatics. 2015 Dec 15;31(24):3991-3. doi: 10.1093/bioinformatics/btv497. Epub 2015 Aug 26.

DOI:10.1093/bioinformatics/btv497

PMID:26315909

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4692967/

Abstract

UNLABELLED

One central theme of modern high-throughput genomic data analysis is to identify relevant genomic features as well as build up a predictive model based on selected features for various tasks such as personalized medicine. Correlating the large number of 'omics' features with a certain phenotype is particularly challenging due to small sample size (n) and high dimensionality (p). To address this small n, large p problem, various forms of sparse regression models have been proposed by exploiting the sparsity assumption. Among these, network-constrained sparse regression model is of particular interest due to its ability to utilize the prior graph/network structure in the omics data. Despite its potential usefulness for omics data analysis, no efficient R implementation is publicly available. Here we present an R software package 'glmgraph' that implements the graph-constrained regularization for both sparse linear regression and sparse logistic regression. We implement both the L1 penalty and minimax concave penalty for variable selection and Laplacian penalty for coefficient smoothing. Efficient coordinate descent algorithm is used to solve the optimization problem. We demonstrate the use of the package by applying it to a human microbiome dataset, where phylogeny structure among bacterial taxa is available.

AVAILABILITY AND IMPLEMENTATION

'glmgraph' is implemented in R and C++ Armadillo and publicly available under CRAN.

摘要

未标注

现代高通量基因组数据分析的一个核心主题是识别相关的基因组特征，并基于选定的特征构建预测模型，以用于各种任务，如个性化医疗。由于样本量小（n）和维度高（p），将大量的“组学”特征与特定表型相关联极具挑战性。为了解决小n大p问题，人们利用稀疏性假设提出了各种形式的稀疏回归模型。其中，网络约束稀疏回归模型因其能够利用组学数据中的先验图/网络结构而备受关注。尽管它对组学数据分析有潜在的用处，但目前还没有公开可用的高效R语言实现。在这里，我们展示了一个R软件包“glmgraph”，它为稀疏线性回归和稀疏逻辑回归实现了图约束正则化。我们为变量选择实现了L1惩罚和极小极大凹惩罚，为系数平滑实现了拉普拉斯惩罚。使用高效的坐标下降算法来解决优化问题。我们通过将其应用于一个人类微生物组数据集来展示该软件包的使用，该数据集中细菌类群之间的系统发育结构是可用的。

可用性和实现方式

“glmgraph”用R语言和C++ Armadillo实现，并在CRAN上公开可用。

相似文献

glmgraph: an R package for variable selection and predictive modeling of structured genomic data.glmgraph：一个用于结构化基因组数据变量选择和预测建模的R包。

Bioinformatics. 2015 Dec 15;31(24):3991-3. doi: 10.1093/bioinformatics/btv497. Epub 2015 Aug 26.

Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification.基于 L1/2 罚项的稀疏逻辑回归在癌症分类中的基因选择。

BMC Bioinformatics. 2013 Jun 19;14:198. doi: 10.1186/1471-2105-14-198.

Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics.高维工具变量回归的正则化方法及其在遗传基因组学中的应用

J Am Stat Assoc. 2015;110(509):270-288. doi: 10.1080/01621459.2014.908125.

A Bayesian group sparse multi-task regression model for imaging genetics.一种用于影像遗传学的贝叶斯组稀疏多任务回归模型。

Bioinformatics. 2017 Aug 15;33(16):2513-2522. doi: 10.1093/bioinformatics/btx215.

A Linear Mixed Model Spline Framework for Analysing Time Course 'Omics' Data.用于分析时间进程“组学”数据的线性混合模型样条框架

PLoS One. 2015 Aug 27;10(8):e0134540. doi: 10.1371/journal.pone.0134540. eCollection 2015.

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA.用于基因组数据分析的网络正则化高维Cox回归

Stat Sin. 2014 Jul;24(3):1433-1459. doi: 10.5705/ss.2012.317.

eNetXplorer: an R package for the quantitative exploration of elastic net families for generalized linear models.eNetXplorer：用于广义线性模型中弹性网络家族的定量探索的 R 包。

BMC Bioinformatics. 2019 Apr 16;20(1):189. doi: 10.1186/s12859-019-2778-5.

synbreed: a framework for the analysis of genomic prediction data using R.synbreed：一个使用 R 进行基因组预测数据分析的框架。

Bioinformatics. 2012 Aug 1;28(15):2086-7. doi: 10.1093/bioinformatics/bts335. Epub 2012 Jun 10.

Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data.基于偏差残差的稀疏偏最小二乘和稀疏核偏最小二乘回归用于删失数据。

Bioinformatics. 2015 Feb 1;31(3):397-404. doi: 10.1093/bioinformatics/btu660. Epub 2014 Oct 6.

Ridle for sparse regression with mandatory covariates with application to the genetic assessment of histologic grades of breast cancer.带有强制协变量的稀疏回归难题及其在乳腺癌组织学分级基因评估中的应用

BMC Med Res Methodol. 2017 Jan 25;17(1):12. doi: 10.1186/s12874-017-0291-y.

引用本文的文献

DeepBiome: A Phylogenetic Tree Informed Deep Neural Network for Microbiome Data Analysis.深度生物群落：一种基于系统发育树的深度神经网络用于微生物组数据分析。

Stat Biosci. 2025 Apr;17(1):191-215. doi: 10.1007/s12561-024-09434-9. Epub 2024 Jun 14.

Interpretable deep learning of single-cell and epigenetic data reveals novel molecular insights in aging.单细胞和表观遗传数据的可解释深度学习揭示了衰老中的新分子见解。

Sci Rep. 2025 Feb 11;15(1):5048. doi: 10.1038/s41598-025-89646-1.

GPS-Net: Discovering prognostic pathway modules based on network regularized kernel learning.GPS-Net：基于网络正则化核学习发现预后通路模块。

Am J Hum Genet. 2024 Dec 5;111(12):2826-2838. doi: 10.1016/j.ajhg.2024.10.004. Epub 2024 Nov 6.

GPS-Net: discovering prognostic pathway modules based on network regularized kernel learning.GPS-Net：基于网络正则化核学习发现预后通路模块

bioRxiv. 2024 Jul 18:2024.07.15.603645. doi: 10.1101/2024.07.15.603645.

Deep Trans-Omic Network Fusion for Molecular Mechanism of Alzheimer's Disease.深度跨组学网络融合分析阿尔茨海默病的分子机制

J Alzheimers Dis. 2024;99(2):715-727. doi: 10.3233/JAD-240098.

Accounting for network noise in graph-guided Bayesian modeling of structured high-dimensional data.在基于图引导的贝叶斯建模对结构化高维数据进行建模时，考虑网络噪声的影响。

Biometrics. 2024 Jan 29;80(1). doi: 10.1093/biomtc/ujae012.

Should we really use graph neural networks for transcriptomic prediction?我们真的应该使用图神经网络进行转录组预测吗？

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae027.

Bi-level structured functional analysis for genome-wide association studies.基于双层结构的全基因组关联研究功能分析。

Biometrics. 2023 Dec;79(4):3359-3373. doi: 10.1111/biom.13871. Epub 2023 May 7.

Identifying brain hierarchical structures associated with Alzheimer's disease using a regularized regression method with tree predictors.使用带树型预测器的正则化回归方法识别与阿尔茨海默病相关的大脑层次结构。

Biometrics. 2023 Sep;79(3):2333-2345. doi: 10.1111/biom.13775. Epub 2022 Nov 4.

Multi-omics disease module detection with an explainable Greedy Decision Forest.基于可解释的贪心决策森林的多组学疾病模块检测。

Sci Rep. 2022 Oct 7;12(1):16857. doi: 10.1038/s41598-022-21417-8.

本文引用的文献

Network-constrained group lasso for high-dimensional multinomial classification with application to cancer subtype prediction.用于高维多项分类并应用于癌症亚型预测的网络约束组套索法

Cancer Inform. 2015 Jan 12;13(Suppl 6):25-33. doi: 10.4137/CIN.S17686. eCollection 2014.

Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis.基于结构约束的稀疏典型相关分析及其在微生物组数据分析中的应用。

Biostatistics. 2013 Apr;14(2):244-58. doi: 10.1093/biostatistics/kxs038. Epub 2012 Oct 15.

Optimized application of penalized regression methods to diverse genomic data.优化惩罚回归方法在多种基因组数据中的应用。

Bioinformatics. 2011 Dec 15;27(24):3399-406. doi: 10.1093/bioinformatics/btr591.

The Sparse Laplacian Shrinkage Estimator for High-Dimensional Regression.用于高维回归的稀疏拉普拉斯收缩估计器

Ann Stat. 2011;39(4):2021-2046. doi: 10.1214/11-aos897.

COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION.用于非凸惩罚回归的坐标下降算法及其在生物特征选择中的应用

Ann Appl Stat. 2011 Jan 1;5(1):232-253. doi: 10.1214/10-AOAS388.

Disordered microbial communities in the upper respiratory tract of cigarette smokers.吸烟人群上呼吸道中紊乱的微生物群落。

PLoS One. 2010 Dec 20;5(12):e15216. doi: 10.1371/journal.pone.0015216.

Regularization Paths for Generalized Linear Models via Coordinate Descent.基于坐标下降法的广义线性模型正则化路径

J Stat Softw. 2010;33(1):1-22.

Network-constrained regularization and variable selection for analysis of genomic data.用于基因组数据分析的网络约束正则化和变量选择

Bioinformatics. 2008 May 1;24(9):1175-82. doi: 10.1093/bioinformatics/btn081. Epub 2008 Mar 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。