基于通路信息纳入的群组尖峰-条纹套索广义线性模型在疾病预测和相关基因检测中的应用

Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information.

机构信息

Department of Biostatistics, School of Public Health.

Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases.

出版信息

Bioinformatics. 2018 Mar 15;34(6):901-910. doi: 10.1093/bioinformatics/btx684.

DOI:10.1093/bioinformatics/btx684

PMID:29077795

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5860634/

Abstract

MOTIVATION

Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information.

RESULTS

We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes.

AVAILABILITY AND IMPLEMENTATION

The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).

CONTACT

nyi@uab.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

大规模分子数据已被越来越多地用作疾病预后预测和相关基因检测的重要资源。然而，组学数据分析的标准方法忽略了功能关系或途径信息中编码基因的群体结构。

结果

我们提出了新的贝叶斯分层广义线性模型，称为组尖峰-哑块 LASSO GLM，用于通过整合大规模分子数据和群体结构来预测疾病结局和检测相关基因。所提出的模型采用混合双指数先验对系数进行建模，从而对不同系数进行自适应收缩。通过设置特定于组的参数将组信息纳入模型。我们开发了一种快速而稳定的确定性算法来拟合所提出的层次 GLM，可以在组内进行变量选择。我们通过改变组之间的重叠、组大小、非零组的数量和组内相关性，在几个模拟场景中评估了所提出方法的性能。与现有方法相比，所提出的方法不仅提供了更准确的参数估计，而且还提供了更好的预测。我们还通过利用基因途径结构在三个癌症数据集上展示了所提出程序的应用。我们的结果表明，所提出的方法生成了用于预测疾病结局和检测相关基因的强大模型。

可用性和实现

该方法已在一个免费的 R 包 BhGLM（http://www.ssg.uab.edu/bhglm/）中实现。

联系方式

nyi@uab.edu。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information.基于通路信息纳入的群组尖峰-条纹套索广义线性模型在疾病预测和相关基因检测中的应用

Bioinformatics. 2018 Mar 15;34(6):901-910. doi: 10.1093/bioinformatics/btx684.

The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection.用于预测和相关基因检测的尖峰和平板套索广义线性模型。

Genetics. 2017 Jan;205(1):77-88. doi: 10.1534/genetics.116.192195. Epub 2016 Oct 31.

The spike-and-slab lasso Cox model for survival prediction and associated genes detection.用于生存预测和相关基因检测的尖峰-平板套索 Cox 模型。

Bioinformatics. 2017 Sep 15;33(18):2799-2807. doi: 10.1093/bioinformatics/btx300.

Gsslasso Cox: a Bayesian hierarchical model for predicting survival and detecting associated genes by incorporating pathway information.Gsslasso Cox：一种贝叶斯层次模型，通过整合通路信息来预测生存并检测相关基因。

BMC Bioinformatics. 2019 Feb 27;20(1):94. doi: 10.1186/s12859-019-2656-1.

BhGLM: Bayesian hierarchical GLMs and survival models, with applications to genomics and epidemiology.BhGLM：贝叶斯层次广义线性模型和生存模型，及其在基因组学和流行病学中的应用。

Bioinformatics. 2019 Apr 15;35(8):1419-1421. doi: 10.1093/bioinformatics/bty803.

Hierarchical shrinkage priors and model fitting for high-dimensional generalized linear models.高维广义线性模型的分层收缩先验和模型拟合

Stat Appl Genet Mol Biol. 2012 Nov 26;11(6):/j/sagmb.2012.11.issue-6/1544-6115.1803/1544-6115.1803.xml. doi: 10.1515/1544-6115.1803.

A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data.一种使用高维组学数据进行癌症生存预测的新型非负贝叶斯堆叠建模方法。

BMC Med Res Methodol. 2024 May 3;24(1):105. doi: 10.1186/s12874-024-02232-3.

Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects.用于罕见和常见变异的多个组的分层广义线性模型：联合估计组和个体变异效应。

PLoS Genet. 2011 Dec;7(12):e1002382. doi: 10.1371/journal.pgen.1002382. Epub 2011 Dec 1.

The spike-and-slab lasso and scalable algorithm to accommodate multinomial outcomes in variable selection problems.用于变量选择问题中处理多项结果的尖峰和平板套索及可扩展算法。

J Appl Stat. 2023 Sep 14;51(11):2039-2061. doi: 10.1080/02664763.2023.2258301. eCollection 2024.

A two-stage approach for combining gene expression and mutation with clinical data improves survival prediction in myelodysplastic syndromes and ovarian cancer.一种将基因表达、突变与临床数据相结合的两阶段方法可改善骨髓增生异常综合征和卵巢癌的生存预测。

J Bioinform Genom. 2016 Sep;1(1). doi: 10.18454/jbg.2016.1.1.2. Epub 2016 Sep 15.

引用本文的文献

J Appl Stat. 2023 Sep 14;51(11):2039-2061. doi: 10.1080/02664763.2023.2258301. eCollection 2024.

Fast Marginal Likelihood Estimation of Penalties for Group-Adaptive Elastic Net.分组自适应弹性网络惩罚项的快速边际似然估计

J Comput Graph Stat. 2022 Nov 9;32(3):950-960. doi: 10.1080/10618600.2022.2128809. eCollection 2023.

Spike-and-slab least absolute shrinkage and selection operator generalized additive models and scalable algorithms for high-dimensional data analysis.基于 Spike-and-Slab 最小绝对收缩和选择算子的广义加性模型及其在高维数据分析中的可扩展算法。

Stat Med. 2022 Sep 10;41(20):3899-3914. doi: 10.1002/sim.9483. Epub 2022 Jun 5.

Predicting Grating Orientations With Cross-Frequency Coupling and Least Absolute Shrinkage and Selection Operator in V1 and V4 of Rhesus Monkeys.利用交叉频率耦合和最小绝对收缩与选择算子预测恒河猴V1和V4区的光栅方向

Front Comput Neurosci. 2021 Jan 25;14:605104. doi: 10.3389/fncom.2020.605104. eCollection 2020.

How Can Gene-Expression Information Improve Prognostic Prediction in TCGA Cancers: An Empirical Comparison Study on Regularization and Mixed Cox Models.基因表达信息如何改善TCGA癌症中的预后预测：正则化和混合Cox模型的实证比较研究

Front Genet. 2020 Aug 21;11:920. doi: 10.3389/fgene.2020.00920. eCollection 2020.

Integrating Multiple Data Sources and Learning Models to Predict Infectious Diseases in China.整合多数据源与学习模型以预测中国的传染病

AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:680-685. eCollection 2019.

Jackknife Model Averaging Prediction Methods for Complex Phenotypes with Gene Expression Levels by Integrating External Pathway Information.通过整合外部通路信息对具有基因表达水平的复杂表型进行折刀法模型平均预测方法

Comput Math Methods Med. 2019 Apr 8;2019:2807470. doi: 10.1155/2019/2807470. eCollection 2019.

Structured Genome-Wide Association Studies with Bayesian Hierarchical Variable Selection.基于贝叶斯分层变量选择的结构全基因组关联研究。

Genetics. 2019 Jun;212(2):397-415. doi: 10.1534/genetics.119.301906. Epub 2019 Apr 22.

WNT pathway signaling is associated with microvascular injury and predicts kidney transplant failure.WNT 通路信号与微血管损伤有关，并可预测肾移植失败。

Am J Transplant. 2019 Oct;19(10):2833-2845. doi: 10.1111/ajt.15372. Epub 2019 May 10.

BMC Bioinformatics. 2019 Feb 27;20(1):94. doi: 10.1186/s12859-019-2656-1.

本文引用的文献

The spike-and-slab lasso Cox model for survival prediction and associated genes detection.用于生存预测和相关基因检测的尖峰-平板套索 Cox 模型。

Bioinformatics. 2017 Sep 15;33(18):2799-2807. doi: 10.1093/bioinformatics/btx300.

The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection.用于预测和相关基因检测的尖峰和平板套索广义线性模型。

Genetics. 2017 Jan;205(1):77-88. doi: 10.1534/genetics.116.192195. Epub 2016 Oct 31.

Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection.重叠组逻辑回归及其在遗传通路选择中的应用

Cancer Inform. 2016 Sep 15;15:179-87. doi: 10.4137/CIN.S40043. eCollection 2016.

Mitochondria-Targeted Doxorubicin: A New Therapeutic Strategy against Doxorubicin-Resistant Osteosarcoma.线粒体靶向阿霉素：一种抗阿霉素耐药骨肉瘤的新治疗策略。

Mol Cancer Ther. 2016 Nov;15(11):2640-2652. doi: 10.1158/1535-7163.MCT-16-0048. Epub 2016 Jul 27.

Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent.通过坐标下降法求解Cox比例风险模型的正则化路径

J Stat Softw. 2011 Mar;39(5):1-13. doi: 10.18637/jss.v039.i05.

Multiple SNP Set Analysis for Genome-Wide Association Studies Through Bayesian Latent Variable Selection.通过贝叶斯潜在变量选择进行全基因组关联研究的多单核苷酸多态性集分析

Genet Epidemiol. 2015 Dec;39(8):664-77. doi: 10.1002/gepi.21932. Epub 2015 Oct 30.

KEGG as a reference resource for gene and protein annotation.KEGG作为基因和蛋白质注释的参考资源。

Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62. doi: 10.1093/nar/gkv1070. Epub 2015 Oct 17.

Agglomerative joint clustering of metabolic data with spike at zero: A Bayesian perspective.零值处有尖峰的代谢数据的凝聚联合聚类：贝叶斯视角

Biom J. 2016 Mar;58(2):387-96. doi: 10.1002/bimj.201400110. Epub 2015 Jun 22.

Nonlinear spike-and-slab sparse coding for interpretable image encoding.用于可解释图像编码的非线性尖峰和平板稀疏编码

PLoS One. 2015 May 8;10(5):e0124088. doi: 10.1371/journal.pone.0124088. eCollection 2015.

The group exponential lasso for bi-level variable selection.用于双层变量选择的组指数套索法

Biometrics. 2015 Sep;71(3):731-40. doi: 10.1111/biom.12300. Epub 2015 Mar 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验