Suppr超能文献

基于通路信息纳入的群组尖峰-条纹套索广义线性模型在疾病预测和相关基因检测中的应用

Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information.

机构信息

Department of Biostatistics, School of Public Health.

Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases.

出版信息

Bioinformatics. 2018 Mar 15;34(6):901-910. doi: 10.1093/bioinformatics/btx684.

Abstract

MOTIVATION

Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information.

RESULTS

We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes.

AVAILABILITY AND IMPLEMENTATION

The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).

CONTACT

nyi@uab.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大规模分子数据已被越来越多地用作疾病预后预测和相关基因检测的重要资源。然而,组学数据分析的标准方法忽略了功能关系或途径信息中编码基因的群体结构。

结果

我们提出了新的贝叶斯分层广义线性模型,称为组尖峰-哑块 LASSO GLM,用于通过整合大规模分子数据和群体结构来预测疾病结局和检测相关基因。所提出的模型采用混合双指数先验对系数进行建模,从而对不同系数进行自适应收缩。通过设置特定于组的参数将组信息纳入模型。我们开发了一种快速而稳定的确定性算法来拟合所提出的层次 GLM,可以在组内进行变量选择。我们通过改变组之间的重叠、组大小、非零组的数量和组内相关性,在几个模拟场景中评估了所提出方法的性能。与现有方法相比,所提出的方法不仅提供了更准确的参数估计,而且还提供了更好的预测。我们还通过利用基因途径结构在三个癌症数据集上展示了所提出程序的应用。我们的结果表明,所提出的方法生成了用于预测疾病结局和检测相关基因的强大模型。

可用性和实现

该方法已在一个免费的 R 包 BhGLM(http://www.ssg.uab.edu/bhglm/)中实现。

联系方式

nyi@uab.edu

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

6
Hierarchical shrinkage priors and model fitting for high-dimensional generalized linear models.高维广义线性模型的分层收缩先验和模型拟合
Stat Appl Genet Mol Biol. 2012 Nov 26;11(6):/j/sagmb.2012.11.issue-6/1544-6115.1803/1544-6115.1803.xml. doi: 10.1515/1544-6115.1803.

引用本文的文献

2
Fast Marginal Likelihood Estimation of Penalties for Group-Adaptive Elastic Net.分组自适应弹性网络惩罚项的快速边际似然估计
J Comput Graph Stat. 2022 Nov 9;32(3):950-960. doi: 10.1080/10618600.2022.2128809. eCollection 2023.

本文引用的文献

7
KEGG as a reference resource for gene and protein annotation.KEGG作为基因和蛋白质注释的参考资源。
Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62. doi: 10.1093/nar/gkv1070. Epub 2015 Oct 17.
10
The group exponential lasso for bi-level variable selection.用于双层变量选择的组指数套索法
Biometrics. 2015 Sep;71(3):731-40. doi: 10.1111/biom.12300. Epub 2015 Mar 13.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验