Kakourou Alexia, Mertens Bart
Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, 2300, RC, Leiden, The Netherlands.
Biom J. 2018 Sep;60(5):1003-1020. doi: 10.1002/bimj.201700182. Epub 2018 Jun 25.
We explore the problem of variable selection in a case-control setting with mass spectrometry proteomic data consisting of paired measurements. Each pair corresponds to a distinct isotope cluster and each component within pair represents a summary of isotopic expression based on either the intensity or the shape of the cluster. Our objective is to identify a collection of isotope clusters associated with the disease outcome and at the same time assess the predictive added-value of shape beyond intensity while maintaining predictive performance. We propose a Bayesian model that exploits the paired structure of our data and utilizes prior information on the relative predictive power of each source by introducing multiple layers of selection. This allows us to make simultaneous inference on which are the most informative pairs and for which-and to what extent-shape has a complementary value in separating the two groups. We evaluate the Bayesian model on pancreatic cancer data. Results from the fitted model show that most predictive potential is achieved with a subset of just six (out of 1289) pairs while the contribution of the intensity components is much higher than the shape components. To demonstrate how the method behaves under a controlled setting we consider a simulation study. Results from this study indicate that the proposed approach can successfully select the truly predictive pairs and accurately estimate the effects of both components although, in some cases, the model tends to overestimate the inclusion probability of the second component.
我们探讨了在病例对照研究中进行变量选择的问题,该研究使用的是由配对测量组成的质谱蛋白质组学数据。每一对对应一个独特的同位素簇,每一对中的每个成分代表基于簇的强度或形状的同位素表达汇总。我们的目标是识别与疾病结局相关的一组同位素簇,同时在保持预测性能的情况下,评估形状相对于强度的预测附加值。我们提出了一个贝叶斯模型,该模型利用了数据的配对结构,并通过引入多层选择来利用关于每个来源相对预测能力的先验信息。这使我们能够同时推断出哪些是最具信息性的对,以及形状在区分两组时对哪些对以及在何种程度上具有互补价值。我们在胰腺癌数据上评估了贝叶斯模型。拟合模型的结果表明,仅用六对(共1289对)的一个子集就能实现大部分预测潜力,而强度成分的贡献远高于形状成分。为了展示该方法在可控环境下的表现,我们进行了一项模拟研究。该研究的结果表明,尽管在某些情况下模型往往会高估第二个成分的包含概率,但所提出的方法能够成功选择真正具有预测性的对,并准确估计两个成分的效应。