Suppr超能文献

glmgraph:一个用于结构化基因组数据变量选择和预测建模的R包。

glmgraph: an R package for variable selection and predictive modeling of structured genomic data.

作者信息

Chen Li, Liu Han, Kocher Jean-Pierre A, Li Hongzhe, Chen Jun

机构信息

Division of Biomedical Statistics and Informatics and Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905,USA, Department of Computer Science, Emory University, Atlanta, GA 30322,USA.

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA and.

出版信息

Bioinformatics. 2015 Dec 15;31(24):3991-3. doi: 10.1093/bioinformatics/btv497. Epub 2015 Aug 26.

Abstract

UNLABELLED

One central theme of modern high-throughput genomic data analysis is to identify relevant genomic features as well as build up a predictive model based on selected features for various tasks such as personalized medicine. Correlating the large number of 'omics' features with a certain phenotype is particularly challenging due to small sample size (n) and high dimensionality (p). To address this small n, large p problem, various forms of sparse regression models have been proposed by exploiting the sparsity assumption. Among these, network-constrained sparse regression model is of particular interest due to its ability to utilize the prior graph/network structure in the omics data. Despite its potential usefulness for omics data analysis, no efficient R implementation is publicly available. Here we present an R software package 'glmgraph' that implements the graph-constrained regularization for both sparse linear regression and sparse logistic regression. We implement both the L1 penalty and minimax concave penalty for variable selection and Laplacian penalty for coefficient smoothing. Efficient coordinate descent algorithm is used to solve the optimization problem. We demonstrate the use of the package by applying it to a human microbiome dataset, where phylogeny structure among bacterial taxa is available.

AVAILABILITY AND IMPLEMENTATION

'glmgraph' is implemented in R and C++ Armadillo and publicly available under CRAN.

摘要

未标注

现代高通量基因组数据分析的一个核心主题是识别相关的基因组特征,并基于选定的特征构建预测模型,以用于各种任务,如个性化医疗。由于样本量小(n)和维度高(p),将大量的“组学”特征与特定表型相关联极具挑战性。为了解决小n大p问题,人们利用稀疏性假设提出了各种形式的稀疏回归模型。其中,网络约束稀疏回归模型因其能够利用组学数据中的先验图/网络结构而备受关注。尽管它对组学数据分析有潜在的用处,但目前还没有公开可用的高效R语言实现。在这里,我们展示了一个R软件包“glmgraph”,它为稀疏线性回归和稀疏逻辑回归实现了图约束正则化。我们为变量选择实现了L1惩罚和极小极大凹惩罚,为系数平滑实现了拉普拉斯惩罚。使用高效的坐标下降算法来解决优化问题。我们通过将其应用于一个人类微生物组数据集来展示该软件包的使用,该数据集中细菌类群之间的系统发育结构是可用的。

可用性和实现方式

“glmgraph”用R语言和C++ Armadillo实现,并在CRAN上公开可用。

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验