Newcombe P J, Raza Ali H, Blows F M, Provenzano E, Pharoah P D, Caldas C, Richardson S
1 MRC Biostatistics Unit, Cambridge, UK.
2 Cancer Research UK Cambridge Institute, Cambridge, UK.
Stat Methods Med Res. 2017 Feb;26(1):414-436. doi: 10.1177/0962280214548748. Epub 2016 Sep 30.
As data-rich medical datasets are becoming routinely collected, there is a growing demand for regression methodology that facilitates variable selection over a large number of predictors. Bayesian variable selection algorithms offer an attractive solution, whereby a sparsity inducing prior allows inclusion of sets of predictors simultaneously, leading to adjusted effect estimates and inference of which covariates are most important. We present a new implementation of Bayesian variable selection, based on a Reversible Jump MCMC algorithm, for survival analysis under the Weibull regression model. A realistic simulation study is presented comparing against an alternative LASSO-based variable selection strategy in datasets of up to 20,000 covariates. Across half the scenarios, our new method achieved identical sensitivity and specificity to the LASSO strategy, and a marginal improvement otherwise. Runtimes were comparable for both approaches, taking approximately a day for 20,000 covariates. Subsequently, we present a real data application in which 119 protein-based markers are explored for association with breast cancer survival in a case cohort of 2287 patients with oestrogen receptor-positive disease. Evidence was found for three independent prognostic tumour markers of survival, one of which is novel. Our new approach demonstrated the best specificity.
随着富含数据的医学数据集被常规收集,对能够在大量预测变量中进行变量选择的回归方法的需求日益增长。贝叶斯变量选择算法提供了一个有吸引力的解决方案,即通过稀疏诱导先验允许同时包含预测变量集,从而得到调整后的效应估计,并推断出哪些协变量最为重要。我们基于可逆跳跃马尔可夫链蒙特卡罗(Reversible Jump MCMC)算法提出了一种贝叶斯变量选择的新实现方法,用于威布尔回归模型下的生存分析。我们进行了一项现实的模拟研究,将其与基于套索(LASSO)的替代变量选择策略在多达20000个协变量的数据集上进行比较。在一半的情况下,我们的新方法与LASSO策略具有相同的敏感性和特异性,在其他情况下有小幅改进。两种方法的运行时间相当,对于20000个协变量大约需要一天时间。随后,我们展示了一个实际数据应用,在一个由2287名雌激素受体阳性疾病患者组成的病例队列中,探索119个基于蛋白质的标志物与乳腺癌生存的关联。发现了三个独立的生存预后肿瘤标志物,其中一个是新发现的。我们的新方法表现出了最佳的特异性。