Gatto Laurent, Breckels Lisa M, Burger Thomas, Nightingale Daniel J H, Groen Arnoud J, Campbell Callum, Nikolovski Nino, Mulvey Claire M, Christoforou Andy, Ferro Myriam, Lilley Kathryn S
From the ‡Cambridge Centre for Proteomics, Department of Biochemistry, Tennis Court Road, University of Cambridge, Cambridge, CB2 1QR, United Kingdom; §Computational Proteomics Unit, Department of Biochemistry, Tennis Court Road, University of Cambridge, Cambridge, CB2 1QR, United Kingdom;
¶Université Grenoble-Alpes, CEA (iRSTV/BGE), INSERM (U1038), CNRS (FR3425), F-38054 Grenoble, France.
Mol Cell Proteomics. 2014 Aug;13(8):1937-52. doi: 10.1074/mcp.M113.036350. Epub 2014 May 20.
Quantitative mass-spectrometry-based spatial proteomics involves elaborate, expensive, and time-consuming experimental procedures, and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches for establishing high-quality proteome-wide datasets. However, data analysis is as critical as data production for reliable and insightful biological interpretation, and no consistent and robust solutions have been offered to the community so far. Here, we introduce the requirements for rigorous spatial proteomics data analysis, as well as the statistical machine learning methodologies needed to address them, including supervised and semi-supervised machine learning, clustering, and novelty detection. We present freely available software solutions that implement innovative state-of-the-art analysis pipelines and illustrate the use of these tools through several case studies involving multiple organisms, experimental designs, mass spectrometry platforms, and quantitation techniques. We also propose sound analysis strategies for identifying dynamic changes in subcellular localization by comparing and contrasting data describing different biological conditions. We conclude by discussing future needs and developments in spatial proteomics data analysis.
基于定量质谱的空间蛋白质组学涉及复杂、昂贵且耗时的实验程序,并且在生成此类数据方面投入了大量精力。多个研究小组描述了多种建立高质量全蛋白质组数据集的方法。然而,数据分析对于可靠且有见地的生物学解释而言与数据生成同样关键,并且迄今为止尚未向科学界提供一致且强大的解决方案。在此,我们介绍严格的空间蛋白质组学数据分析的要求,以及解决这些要求所需的统计机器学习方法,包括监督式和半监督式机器学习、聚类和异常检测。我们展示了可免费获取的软件解决方案,这些方案实现了创新的前沿分析流程,并通过涉及多种生物体、实验设计、质谱平台和定量技术的多个案例研究来说明这些工具的使用。我们还提出了合理的分析策略,用于通过比较和对比描述不同生物学条件的数据来识别亚细胞定位的动态变化。我们通过讨论空间蛋白质组学数据分析的未来需求和发展来得出结论。