Treppmann Tabea, Ickstadt Katja, Zucknick Manuela
EXCO, Penzberg, Germany.
Department of Statistics, TU Dortmund University, Dortmund, Germany.
Comput Math Methods Med. 2017;2017:7340565. doi: 10.1155/2017/7340565. Epub 2017 Jul 30.
Bayesian variable selection becomes more and more important in statistical analyses, in particular when performing variable selection in high dimensions. For survival time models and in the presence of genomic data, the state of the art is still quite unexploited. One of the more recent approaches suggests a Bayesian semiparametric proportional hazards model for right censored time-to-event data. We extend this model to directly include variable selection, based on a stochastic search procedure within a Markov chain Monte Carlo sampler for inference. This equips us with an intuitive and flexible approach and provides a way for integrating additional data sources and further extensions. We make use of the possibility of implementing parallel tempering to help improve the mixing of the Markov chains. In our examples, we use this Bayesian approach to integrate copy number variation data into a gene-expression-based survival prediction model. This is achieved by formulating an informed prior based on copy number variation. We perform a simulation study to investigate the model's behavior and prediction performance in different situations before applying it to a dataset of glioblastoma patients and evaluating the biological relevance of the findings.
贝叶斯变量选择在统计分析中变得越来越重要,特别是在高维数据中进行变量选择时。对于生存时间模型以及在基因组数据存在的情况下,目前的技术水平仍未得到充分利用。最近的一种方法提出了一种用于右删失事件发生时间数据的贝叶斯半参数比例风险模型。我们基于马尔可夫链蒙特卡罗采样器中的随机搜索过程进行推理,将该模型扩展为直接包含变量选择。这为我们提供了一种直观且灵活的方法,并为整合其他数据源和进一步扩展提供了途径。我们利用实现并行回火的可能性来帮助改善马尔可夫链的混合。在我们的示例中,我们使用这种贝叶斯方法将拷贝数变异数据整合到基于基因表达的生存预测模型中。这是通过基于拷贝数变异制定一个明智的先验来实现的。在将其应用于胶质母细胞瘤患者数据集并评估结果的生物学相关性之前,我们进行了一项模拟研究,以研究该模型在不同情况下的行为和预测性能。