高维比例风险模型的变分贝叶斯推断及其在基因表达中的应用。

Variational Bayes for high-dimensional proportional hazards models with applications within gene expression.

机构信息

Department of Mathematics, Imperial College London, London SW7 2AZ, UK.

Department of Surgery and Cancer, Imperial College London, London W12 0NN, UK.

出版信息

Bioinformatics. 2022 Aug 10;38(16):3918-3926. doi: 10.1093/bioinformatics/btac416.

DOI:10.1093/bioinformatics/btac416

PMID:35751586

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9364383/

Abstract

MOTIVATION

Few Bayesian methods for analyzing high-dimensional sparse survival data provide scalable variable selection, effect estimation and uncertainty quantification. Such methods often either sacrifice uncertainty quantification by computing maximum a posteriori estimates, or quantify the uncertainty at high (unscalable) computational expense.

RESULTS

We bridge this gap and develop an interpretable and scalable Bayesian proportional hazards model for prediction and variable selection, referred to as sparse variational Bayes. Our method, based on a mean-field variational approximation, overcomes the high computational cost of Markov chain Monte Carlo, whilst retaining useful features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, and how the uncertainty quantification offered by our method can be used to provide an interpretable assessment of patient risk.

AVAILABILITY AND IMPLEMENTATION

our method has been implemented as a freely available R package survival.svb (https://github.com/mkomod/survival.svb).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

很少有贝叶斯方法可以分析高维稀疏生存数据，这些方法提供可扩展的变量选择、效果估计和不确定性量化。这些方法往往要么通过计算最大后验估计来牺牲不确定性量化，要么以高（不可扩展）的计算成本来量化不确定性。

结果

我们弥合了这一差距，开发了一种可解释且可扩展的贝叶斯比例风险模型，用于预测和变量选择，称为稀疏变分贝叶斯。我们的方法基于均值场变分逼近，克服了马尔可夫链蒙特卡罗的高计算成本，同时保留了有用的特征，为参数提供了后验分布，并通过后验包含概率提供了一种自然的变量选择机制。通过广泛的模拟评估了我们提出的方法的性能，并将其与其他最先进的贝叶斯变量选择方法进行了比较，证明了我们的方法具有可比性或更好的性能。最后，我们展示了如何在具有删失生存结局的两个转录组数据集上使用我们提出的方法进行变量选择，以及我们的方法提供的不确定性量化如何用于提供对患者风险的可解释评估。

可用性和实现

我们的方法已作为一个免费的 R 包 survival.svb（https://github.com/mkomod/survival.svb）实现。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aa35/9364383/c3cc02399849/btac416f1.jpg

相似文献

Variational Bayes for high-dimensional proportional hazards models with applications within gene expression.

Bioinformatics. 2022 Aug 10;38(16):3918-3926. doi: 10.1093/bioinformatics/btac416.

Implementation of a practical Markov chain Monte Carlo sampling algorithm in PyBioNetFit.

Bioinformatics. 2022 Mar 4;38(6):1770-1772. doi: 10.1093/bioinformatics/btac004.

Bayesian uncertainty quantification and propagation in molecular dynamics simulations: a high performance computing framework.

J Chem Phys. 2012 Oct 14;137(14):144103. doi: 10.1063/1.4757266.

The spike-and-slab lasso Cox model for survival prediction and associated genes detection.

Bioinformatics. 2017 Sep 15;33(18):2799-2807. doi: 10.1093/bioinformatics/btx300.

Integration of Multiple Genomic Data Sources in a Bayesian Cox Model for Variable Selection and Prediction.

Comput Math Methods Med. 2017;2017:7340565. doi: 10.1155/2017/7340565. Epub 2017 Jul 30.

Int J Biostat. 2020 Sep 21;17(1):75-97. doi: 10.1515/ijb-2019-0120.

Bayesian compositional regression with microbiome features via variational inference.

BMC Bioinformatics. 2023 May 22;24(1):210. doi: 10.1186/s12859-023-05219-x.

Bayesian variable selection for the Cox regression model with spatially varying coefficients with applications to Louisiana respiratory cancer data.

Biom J. 2021 Dec;63(8):1607-1622. doi: 10.1002/bimj.202000047. Epub 2021 Jul 28.

Quantifying Registration Uncertainty With Sparse Bayesian Modelling.

IEEE Trans Med Imaging. 2017 Feb;36(2):607-617. doi: 10.1109/TMI.2016.2623608. Epub 2016 Nov 1.

Bayesian adaptive lasso for additive hazard regression with current status data.

Stat Med. 2019 Sep 10;38(20):3703-3718. doi: 10.1002/sim.8137. Epub 2019 Jun 13.

引用本文的文献

BAYESIAN VARIABLE SELECTION IN A COX PROPORTIONAL HAZARDS MODEL WITH THE "SUM OF SINGLE EFFECTS" PRIOR.

ArXiv. 2025 Jun 6:arXiv:2506.06233v1.

The spike-and-slab quantile LASSO for robust variable selection in cancer genomics studies.

Stat Med. 2024 Nov 20;43(26):4928-4983. doi: 10.1002/sim.10196. Epub 2024 Sep 11.

Adaptive MCMC for Bayesian Variable Selection in Generalised Linear Models and Survival Models.

Entropy (Basel). 2023 Sep 8;25(9):1310. doi: 10.3390/e25091310.

本文引用的文献

The Nek2 centrosome-mitotic kinase contributes to the mesenchymal state, cell invasion, and migration of triple-negative breast cancer cells.

Sci Rep. 2021 Apr 27;11(1):9016. doi: 10.1038/s41598-021-88512-0.

Discovery of a biomarker candidate for surgical stratification in high-grade serous ovarian cancer.

Br J Cancer. 2021 Mar;124(7):1286-1293. doi: 10.1038/s41416-020-01252-2. Epub 2021 Jan 21.

BAYESIAN VARIABLE SELECTION FOR SURVIVAL DATA USING INVERSE MOMENT PRIORS.

Ann Appl Stat. 2020 Jun;14(2):809-828. doi: 10.1214/20-AOAS1325. Epub 2020 Jun 29.

A single-cell landscape of high-grade serous ovarian cancer.

Nat Med. 2020 Aug;26(8):1271-1279. doi: 10.1038/s41591-020-0926-0. Epub 2020 Jun 22.

Gremlin-1 augments the oestrogen-related receptor α signalling through EGFR activation: implications for the progression of breast cancer.

Br J Cancer. 2020 Sep;123(6):988-999. doi: 10.1038/s41416-020-0945-0. Epub 2020 Jun 23.

Interpretable factor models of single-cell RNA-seq via variational autoencoders.

Bioinformatics. 2020 Jun 1;36(11):3418-3421. doi: 10.1093/bioinformatics/btaa169.

Higher expression of calcineurin predicts poor prognosis in unique subtype of ovarian cancer.

J Ovarian Res. 2019 Aug 9;12(1):75. doi: 10.1186/s13048-019-0550-0.

Bayesian data integration and variable selection for pan-cancer survival prediction using protein expression data.

Biometrics. 2020 Mar;76(1):316-325. doi: 10.1111/biom.13132. Epub 2019 Oct 3.

Advances in Variational Inference.

IEEE Trans Pattern Anal Mach Intell. 2019 Aug;41(8):2008-2026. doi: 10.1109/TPAMI.2018.2889774. Epub 2018 Dec 25.

Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application.

Brief Bioinform. 2019 Sep 27;20(5):1795-1811. doi: 10.1093/bib/bby051.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高维比例风险模型的变分贝叶斯推断及其在基因表达中的应用。

Variational Bayes for high-dimensional proportional hazards models with applications within gene expression.

机构信息

Department of Mathematics, Imperial College London, London SW7 2AZ, UK.

Department of Surgery and Cancer, Imperial College London, London W12 0NN, UK.