Suppr超能文献

贝达:基于伽玛回归均值-方差趋势的无标签蛋白质组学的贝叶斯分层建模。

Baldur: Bayesian Hierarchical Modeling for Label-Free Proteomics with Gamma Regressing Mean-Variance Trends.

机构信息

Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, Mississippi, USA; Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, Mississippi, USA.

出版信息

Mol Cell Proteomics. 2023 Dec;22(12):100658. doi: 10.1016/j.mcpro.2023.100658. Epub 2023 Oct 7.

Abstract

Label-free proteomics is a fast-growing methodology to infer abundances in mass spectrometry proteomics. Extensive research has focused on spectral quantification and peptide identification. However, research toward modeling and understanding quantitative proteomics data is scarce. Here we propose a Bayesian hierarchical decision model (Baldur) to test for differences in means between conditions for proteins, peptides, and post-translational modifications. We developed a Bayesian regression model to characterize local mean-variance trends in data, to estimate measurement uncertainty and hyperparameters for the decision model. A key contribution is the development of a new gamma regression model that describes the mean-variance dependency as a mixture of a common and a latent trend-allowing for localized trend estimates. We then evaluate the performance of Baldur, limma-trend, and t test on six benchmark datasets: five total proteomics and one post-translational modification dataset. We find that Baldur drastically improves the decision in noisier post-translational modification data over limma-trend and t test. In addition, we see significant improvements using Baldur over the other methods in the total proteomics datasets. Finally, we analyzed Baldur's performance when increasing the number of replicates and found that the method always increases precision with sample size, while showing robust control of the false positive rate. We conclude that our model vastly improves over popular data analysis methods (limma-trend and t test) in several spike-in datasets by achieving a high true positive detection rate, while greatly reducing the false-positive rate.

摘要

无标记蛋白质组学是一种快速发展的方法,可用于推断质谱蛋白质组学中的丰度。大量研究集中在光谱定量和肽鉴定上。然而,用于建模和理解定量蛋白质组学数据的研究很少。在这里,我们提出了一种贝叶斯分层决策模型(Baldur),用于测试蛋白质、肽和翻译后修饰之间条件均值的差异。我们开发了一种贝叶斯回归模型来描述数据中局部均值-方差趋势,以估计决策模型的测量不确定性和超参数。一个关键贡献是开发了一种新的伽马回归模型,该模型将均值-方差依赖性描述为一个常见趋势和一个潜在趋势的混合体,从而允许进行局部趋势估计。然后,我们在六个基准数据集上评估了 Baldur、limma-trend 和 t 检验的性能:五个总蛋白质组学数据集和一个翻译后修饰数据集。我们发现,Baldur 极大地改善了在噪声更大的翻译后修饰数据中的决策,优于 limma-trend 和 t 检验。此外,我们在总蛋白质组学数据集中看到使用 Baldur 会显著提高性能。最后,我们分析了随着重复数量的增加,Baldur 的性能,发现该方法始终随着样本量的增加而提高精度,同时保持假阳性率的稳健控制。我们得出的结论是,我们的模型在几个 Spike-in 数据集上大大优于流行的数据分析方法(limma-trend 和 t 检验),通过实现高真阳性检测率,同时大大降低假阳性率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验