Bioinformatics Institute, University of Auckland, Private Bag, 92019, Auckland, New Zealand.
BMC Bioinformatics. 2012 Jun 19;13:137. doi: 10.1186/1471-2105-13-137.
Two-dimensional polyacrylamide gel electrophoresis (2D PAGE) is commonly used to identify differentially expressed proteins under two or more experimental or observational conditions. Wu et al (2009) developed a univariate probabilistic model which was used to identify differential expression between Case and Control groups, by applying a Likelihood Ratio Test (LRT) to each protein on a 2D PAGE. In contrast to commonly used statistical approaches, this model takes into account the two possible causes of missing values in 2D PAGE: either (1) the non-expression of a protein; or (2) a level of expression that falls below the limit of detection.
We develop a global Bayesian model which extends the previously described model. Unlike the univariate approach, the model reported here is able treat all differentially expressed proteins simultaneously. Whereas each protein is modelled by the univariate likelihood function previously described, several global distributions are used to model the underlying relationship between the parameters associated with individual proteins. These global distributions are able to combine information from each protein to give more accurate estimates of the true parameters. In our implementation of the procedure, all parameters are recovered by Markov chain Monte Carlo (MCMC) integration. The 95% highest posterior density (HPD) intervals for the marginal posterior distributions are used to determine whether differences in protein expression are due to differences in mean expression intensities, and/or differences in the probabilities of expression.
Simulation analyses showed that the global model is able to accurately recover the underlying global distributions, and identify more differentially expressed proteins than the simple application of a LRT. Additionally, simulations also indicate that the probability of incorrectly identifying a protein as differentially expressed (i.e., the False Discovery Rate) is very low. The source code is available at https://github.com/stevenhwu/BIDE-2D.
二维聚丙烯酰胺凝胶电泳(2D PAGE)常用于鉴定两个或更多实验或观察条件下差异表达的蛋白质。Wu 等人(2009 年)开发了一种单变量概率模型,通过对 2D PAGE 上的每个蛋白质应用似然比检验(LRT),用于识别病例和对照组之间的差异表达。与常用的统计方法不同,该模型考虑了 2D PAGE 中缺失值的两种可能原因:(1)蛋白质未表达;或(2)表达水平低于检测限。
我们开发了一种全局贝叶斯模型,扩展了之前描述的模型。与单变量方法不同,这里报告的模型能够同时处理所有差异表达的蛋白质。虽然每个蛋白质都由之前描述的单变量似然函数建模,但使用了几个全局分布来建模与单个蛋白质相关联的参数之间的潜在关系。这些全局分布能够结合每个蛋白质的信息,给出更准确的真实参数估计。在我们的程序实现中,所有参数都通过马尔可夫链蒙特卡罗(MCMC)积分恢复。边际后验分布的 95%最高后验密度(HPD)区间用于确定蛋白质表达的差异是否归因于平均表达强度的差异,以及/或表达概率的差异。
模拟分析表明,全局模型能够准确地恢复潜在的全局分布,并比简单应用 LRT 识别出更多差异表达的蛋白质。此外,模拟还表明,错误地将蛋白质识别为差异表达的概率(即假发现率)非常低。源代码可在 https://github.com/stevenhwu/BIDE-2D 获得。