Huckstep Hannah, Fearnley Liam G, Davis Melissa J
Division of Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.
Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Parkville, Victoria, Australia.
PeerJ. 2021 May 25;9:e11298. doi: 10.7717/peerj.11298. eCollection 2021.
Protein phosphorylation is one of the best known post-translational mechanisms playing a key role in the regulation of cellular processes. Over 100,000 distinct phosphorylation sites have been discovered through constant improvement of mass spectrometry based phosphoproteomics in the last decade. However, data saturation is occurring and the bottleneck of assigning biologically relevant functionality to phosphosites needs to be addressed. There has been finite success in using data-driven approaches to reveal phosphosite functionality due to a range of limitations. The alternate, more suitable approach is making use of prior knowledge from literature-derived databases. Here, we analysed seven widely used databases to shed light on their suitability to provide functional insights into phosphoproteomics data. We first determined the global coverage of each database at both the protein and phosphosite level. We also determined how consistent each database was in its phosphorylation annotations compared to a global standard. Finally, we looked in detail at the coverage of each database over six experimental datasets. Our analysis highlights the relative strengths and weaknesses of each database, providing a guide in how each can be best used to identify biological mechanisms in phosphoproteomic data.
蛋白质磷酸化是最著名的翻译后修饰机制之一,在细胞过程调控中发挥关键作用。在过去十年中,基于质谱的磷酸化蛋白质组学技术不断改进,已发现超过10万个不同的磷酸化位点。然而,数据饱和现象正在出现,需要解决为磷酸化位点赋予生物学相关功能这一瓶颈问题。由于存在一系列局限性,使用数据驱动方法揭示磷酸化位点功能的成效有限。另一种更合适的方法是利用文献衍生数据库中的先验知识。在此,我们分析了七个广泛使用的数据库,以了解它们为磷酸化蛋白质组学数据提供功能见解的适用性。我们首先确定了每个数据库在蛋白质和磷酸化位点水平上的全局覆盖范围。我们还确定了每个数据库与全球标准相比,其磷酸化注释的一致性如何。最后,我们详细研究了每个数据库在六个实验数据集上的覆盖情况。我们的分析突出了每个数据库的相对优势和劣势,为如何最好地利用每个数据库识别磷酸化蛋白质组学数据中的生物学机制提供了指导。