Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom.
Structural and Biophysical Sciences, GlaxoSmithKline R&D, Stevenage SG1 2NY, United Kingdom.
J Proteome Res. 2022 Apr 1;21(4):849-864. doi: 10.1021/acs.jproteome.1c00859. Epub 2022 Mar 8.
Proteomics is a data-rich science with complex experimental designs and an intricate measurement process. To obtain insights from the large data sets produced, statistical methods, including machine learning, are routinely applied. For a quantity of interest, many of these approaches only produce a point estimate, such as a mean, leaving little room for more nuanced interpretations. By contrast, Bayesian statistics allows quantification of uncertainty through the use of probability distributions. These probability distributions enable scientists to ask complex questions of their proteomics data. Bayesian statistics also offers a modular framework for data analysis by making dependencies between data and parameters explicit. Hence, specifying complex hierarchies of parameter dependencies is straightforward in the Bayesian framework. This allows us to use a statistical methodology which equals, rather than neglects, the sophistication of experimental design and instrumentation present in proteomics. Here, we review Bayesian methods applied to proteomics, demonstrating their potential power, alongside the challenges posed by adopting this new statistical framework. To illustrate our review, we give a walk-through of the development of a Bayesian model for dynamic organic orthogonal phase-separation (OOPS) data.
蛋白质组学是一门数据丰富的科学,具有复杂的实验设计和复杂的测量过程。为了从产生的大量数据集中获得见解,通常会应用统计方法,包括机器学习。对于感兴趣的数量,其中许多方法仅产生一个点估计值,例如平均值,几乎没有更细致的解释空间。相比之下,贝叶斯统计通过使用概率分布来量化不确定性。这些概率分布使科学家能够对他们的蛋白质组学数据提出复杂的问题。贝叶斯统计还通过使数据和参数之间的依赖性显式化,为数据分析提供了一个模块化框架。因此,在贝叶斯框架中,指定参数依赖性的复杂层次结构非常简单。这使我们能够使用一种统计方法,该方法与蛋白质组学中存在的实验设计和仪器的复杂性相当,而不是忽略它。在这里,我们回顾了应用于蛋白质组学的贝叶斯方法,展示了它们的潜在威力,以及采用这种新的统计框架所带来的挑战。为了说明我们的审查,我们逐步介绍了用于动态有机正交相分离 (OOPS) 数据的贝叶斯模型的开发。