Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
Bioinformatics. 2020 Feb 1;36(3):859-864. doi: 10.1093/bioinformatics/btz639.
Reversible protein phosphorylation is an essential post-translational modification regulating protein functions and signaling pathways in many cellular processes. Aberrant activation of signaling pathways often contributes to cancer development and progression. The mass spectrometry-based phosphoproteomics technique is a powerful tool to investigate the site-level phosphorylation of the proteome in a global fashion, paving the way for understanding the regulatory mechanisms underlying cancers. However, this approach is time-consuming and requires expensive instruments, specialized expertise and a large amount of starting material. An alternative in silico approach is predicting the phosphoproteomic profiles of cancer patients from the available proteomic, transcriptomic and genomic data.
Here, we present a winning algorithm in the 2017 NCI-CPTAC DREAM Proteogenomics Challenge for predicting phosphorylation levels of the proteome across cancer patients. We integrate four components into our algorithm, including (i) baseline correlations between protein and phosphoprotein abundances, (ii) universal protein-protein interactions, (iii) shareable regulatory information across cancer tissues and (iv) associations among multi-phosphorylation sites of the same protein. When tested on a large held-out testing dataset of 108 breast and 62 ovarian cancer samples, our method ranked first in both cancer tissues, demonstrating its robustness and generalization ability.
Our code and reproducible results are freely available on GitHub: https://github.com/GuanLab/phosphoproteome_prediction.
Supplementary data are available at Bioinformatics online.
可逆蛋白质磷酸化是一种重要的翻译后修饰,调节许多细胞过程中的蛋白质功能和信号通路。信号通路的异常激活常导致癌症的发生和发展。基于质谱的磷酸蛋白质组学技术是一种强大的工具,可以全面研究蛋白质组的位点水平磷酸化,为理解癌症的调控机制铺平了道路。然而,这种方法耗时且需要昂贵的仪器、专业知识和大量的起始材料。另一种替代的计算方法是根据现有的蛋白质组学、转录组学和基因组学数据,从癌症患者中预测磷酸蛋白质组学图谱。
在这里,我们展示了 2017 年 NCI-CPTAC DREAM 蛋白质组学挑战赛中预测癌症患者蛋白质组磷酸化水平的获奖算法。我们将四个组件整合到我们的算法中,包括(i)蛋白质和磷酸蛋白质丰度之间的基线相关性,(ii)通用蛋白质-蛋白质相互作用,(iii)癌症组织之间可共享的调控信息,以及(iv)同一蛋白质的多个磷酸化位点之间的关联。当在 108 个乳腺癌和 62 个卵巢癌样本的大型独立测试数据集上进行测试时,我们的方法在两种癌症组织中均排名第一,证明了其稳健性和泛化能力。
我们的代码和可重复的结果可在 GitHub 上免费获得:https://github.com/GuanLab/phosphoproteome_prediction。
补充数据可在生物信息学在线获得。