Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, USA.
Department of Mathematical Sciences, Kent State University, Kent, Ohio, USA.
Mol Cell Proteomics. 2023 Jan;22(1):100477. doi: 10.1016/j.mcpro.2022.100477. Epub 2022 Dec 8.
Liquid chromatography coupled with bottom-up mass spectrometry (LC-MS/MS)-based proteomics is increasingly used to detect changes in posttranslational modifications (PTMs) in samples from different conditions. Analysis of data from such experiments faces numerous statistical challenges. These include the low abundance of modified proteoforms, the small number of observed peptides that span modification sites, and confounding between changes in the abundance of PTM and the overall changes in the protein abundance. Therefore, statistical approaches for detecting differential PTM abundance must integrate all the available information pertaining to a PTM site and consider all the relevant sources of confounding and variation. In this manuscript, we propose such a statistical framework, which is versatile, accurate, and leads to reproducible results. The framework requires an experimental design, which quantifies, for each sample, both peptides with PTMs and peptides from the same proteins with no modification sites. The proposed framework supports both label-free and tandem mass tag-based LC-MS/MS acquisitions. The statistical methodology separately summarizes the abundances of peptides with and without the modification sites, by fitting separate linear mixed effects models appropriate for the experimental design. Next, model-based inferences regarding the PTM and the protein-level abundances are combined to account for the confounding between these two sources. Evaluations on computer simulations, a spike-in experiment with known ground truth, and three biological experiments with different organisms, modification types, and data acquisition types demonstrate the improved fold change estimation and detection of differential PTM abundance, as compared to currently used approaches. The proposed framework is implemented in the free and open-source R/Bioconductor package MSstatsPTM.
基于液相色谱与自上而下质谱联用(LC-MS/MS)的蛋白质组学技术越来越多地用于检测不同条件下样品中翻译后修饰(PTM)的变化。此类实验数据分析面临许多统计挑战。这些挑战包括修饰型蛋白的丰度低、跨越修饰位点的观察肽数量少,以及修饰丰度和蛋白质整体丰度变化之间的混淆。因此,检测差异 PTM 丰度的统计方法必须整合与 PTM 位点相关的所有可用信息,并考虑所有相关的混淆和变异来源。在本文中,我们提出了这样一种统计框架,该框架具有通用性、准确性,并可产生可重复的结果。该框架需要一个实验设计,该设计对每个样本都定量测量有 PTM 的肽和没有修饰位点的相同蛋白质的肽。所提出的框架支持无标记和串联质量标签(tandem mass tag,TMT)的 LC-MS/MS 采集。统计方法通过拟合适合实验设计的单独线性混合效应模型,分别总结有和没有修饰位点的肽的丰度。然后,基于模型的推断关于 PTM 和蛋白质水平的丰度相结合,以解释这两个来源之间的混淆。在计算机模拟、具有已知真实值的掺入实验以及三个具有不同生物体、修饰类型和数据采集类型的生物学实验中的评估表明,与目前使用的方法相比,该方法可以提高折叠变化的估计和差异 PTM 丰度的检测。所提出的框架在免费和开源的 R/Bioconductor 包 MSstatsPTM 中实现。