Department of Mathematics, Imperial College London, London SW7 2AZ, UK.
Department of Surgery and Cancer, Imperial College London, London W12 0NN, UK.
Bioinformatics. 2022 Aug 10;38(16):3918-3926. doi: 10.1093/bioinformatics/btac416.
Few Bayesian methods for analyzing high-dimensional sparse survival data provide scalable variable selection, effect estimation and uncertainty quantification. Such methods often either sacrifice uncertainty quantification by computing maximum a posteriori estimates, or quantify the uncertainty at high (unscalable) computational expense.
We bridge this gap and develop an interpretable and scalable Bayesian proportional hazards model for prediction and variable selection, referred to as sparse variational Bayes. Our method, based on a mean-field variational approximation, overcomes the high computational cost of Markov chain Monte Carlo, whilst retaining useful features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, and how the uncertainty quantification offered by our method can be used to provide an interpretable assessment of patient risk.
our method has been implemented as a freely available R package survival.svb (https://github.com/mkomod/survival.svb).
Supplementary data are available at Bioinformatics online.
很少有贝叶斯方法可以分析高维稀疏生存数据,这些方法提供可扩展的变量选择、效果估计和不确定性量化。这些方法往往要么通过计算最大后验估计来牺牲不确定性量化,要么以高(不可扩展)的计算成本来量化不确定性。
我们弥合了这一差距,开发了一种可解释且可扩展的贝叶斯比例风险模型,用于预测和变量选择,称为稀疏变分贝叶斯。我们的方法基于均值场变分逼近,克服了马尔可夫链蒙特卡罗的高计算成本,同时保留了有用的特征,为参数提供了后验分布,并通过后验包含概率提供了一种自然的变量选择机制。通过广泛的模拟评估了我们提出的方法的性能,并将其与其他最先进的贝叶斯变量选择方法进行了比较,证明了我们的方法具有可比性或更好的性能。最后,我们展示了如何在具有删失生存结局的两个转录组数据集上使用我们提出的方法进行变量选择,以及我们的方法提供的不确定性量化如何用于提供对患者风险的可解释评估。
我们的方法已作为一个免费的 R 包 survival.svb(https://github.com/mkomod/survival.svb)实现。
补充数据可在生物信息学在线获得。