Koh Hiromi W L, Swa Hannah L F, Fermin Damian, Ler Siok Ghee, Gunaratne Jayantha, Choi Hyungwon
Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore.
Institute of Molecular and Cell Biology, A*STAR, Singapore.
Proteomics. 2015 Aug;15(15):2580-91. doi: 10.1002/pmic.201400620. Epub 2015 May 28.
Labeling-based proteomics is a powerful method for detection of differentially expressed proteins (DEPs). The current data analysis platform typically relies on protein-level ratios, which is obtained by summarizing peptide-level ratios for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is not incorporated into the differential expression (DE) analysis. Here, we propose a novel probabilistic framework EBprot that directly models the peptide-protein hierarchy and rewards the proteins with reproducible evidence of DE over multiple peptides. To evaluate its performance with known DE states, we conducted a simulation study to show that the peptide-level analysis of EBprot provides better receiver-operating characteristic and more accurate estimation of the false discovery rates than the methods based on protein-level ratios. We also demonstrate superior classification performance of peptide-level EBprot analysis in a spike-in dataset. To illustrate the wide applicability of EBprot in different experimental designs, we applied EBprot to a dataset for lung cancer subtype analysis with biological replicates and another dataset for time course phosphoproteome analysis of EGF-stimulated HeLa cells with multiplexed labeling. Through these examples, we show that the peptide-level analysis of EBprot is a robust alternative to the existing statistical methods for the DE analysis of labeling-based quantitative datasets. The software suite is freely available on the Sourceforge website http://ebprot.sourceforge.net/. All MS data have been deposited in the ProteomeXchange with identifier PXD001426 (http://proteomecentral.proteomexchange.org/dataset/PXD001426/).
基于标记的蛋白质组学是检测差异表达蛋白质(DEP)的一种强大方法。当前的数据分析平台通常依赖于蛋白质水平的比率,该比率是通过汇总每种蛋白质的肽水平比率获得的。然而,在鸟枪法蛋白质组学中,一些蛋白质比其他蛋白质用更多的肽进行定量,并且这种可重复性信息未纳入差异表达(DE)分析中。在这里,我们提出了一种新颖的概率框架EBprot,它直接对肽 - 蛋白质层次结构进行建模,并奖励那些在多个肽上具有可重复DE证据的蛋白质。为了用已知的DE状态评估其性能,我们进行了一项模拟研究,结果表明,与基于蛋白质水平比率的方法相比,EBprot的肽水平分析提供了更好的接收者操作特征和对错误发现率更准确的估计。我们还在一个掺入数据集上展示了肽水平EBprot分析的卓越分类性能。为了说明EBprot在不同实验设计中的广泛适用性,我们将EBprot应用于一个具有生物学重复的肺癌亚型分析数据集,以及另一个用于EGF刺激的HeLa细胞时间进程磷酸蛋白质组分析的多重标记数据集。通过这些例子,我们表明,对于基于标记的定量数据集的DE分析,EBprot的肽水平分析是现有统计方法的一种强大替代方法。该软件套件可在Sourceforge网站http://ebprot.sourceforge.net/上免费获得。所有质谱数据已存入ProteomeXchange,标识符为PXD001426(http://proteomecentral.proteomexchange.org/dataset/PXD001426/)。