MS-PyCloud：一种基于云计算的蛋白质组学和糖蛋白质组学数据分析管道。

MS-PyCloud: A Cloud Computing-Based Pipeline for Proteomic and Glycoproteomic Data Analyses.

机构信息

Department of Pathology, School of Medicine, Johns Hopkins University, Baltimore, Maryland 21231, United States.

出版信息

Anal Chem. 2024 Jun 25;96(25):10145-10151. doi: 10.1021/acs.analchem.3c01497. Epub 2024 Jun 13.

DOI:10.1021/acs.analchem.3c01497

Abstract

Rapid development and wide adoption of mass spectrometry-based glycoproteomic technologies have empowered scientists to study proteins and protein glycosylation in complex samples on a large scale. This progress has also created unprecedented challenges for individual laboratories to store, manage, and analyze proteomic and glycoproteomic data, both in the cost for proprietary software and high-performance computing and in the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI), for proteomic and glycoproteomic data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignments to peptide sequences, false discovery rate estimation, protein inference, quantitation of global protein levels, and specific glycan-modified glycopeptides as well as other modification-specific peptides such as phosphorylation, acetylation, and ubiquitination. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open-source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at https://github.com/huizhanglab-jhu/ms-pycloud.

摘要

基于质谱的糖蛋白质组学技术的快速发展和广泛应用，使科学家能够大规模研究复杂样本中的蛋白质和蛋白质糖基化。这一进展也给各个实验室带来了前所未有的挑战，需要存储、管理和分析蛋白质组学和糖蛋白质组学数据，包括专有软件和高性能计算的成本，以及在探索性和发现性分析中，数据处理设置的实时更改会导致处理时间延长。我们开发了一个基于开源和云计算的 MS-PyCloud 分析管道，带有图形用户界面 (GUI)，用于蛋白质组学和糖蛋白质组学数据分析。该管道的主要组件包括数据文件完整性验证、用于将光谱分配给肽序列的 MS/MS 数据库搜索、错误发现率估计、蛋白质推断、全局蛋白质水平的定量以及特定糖基化修饰的糖肽以及其他修饰特异性肽，如磷酸化、乙酰化和泛素化。为了确保数据分析的透明度和可重复性，MS-PyCloud 包括用于光谱分配的具有全面测试和版本控制的开源软件工具。通过利用亚马逊网络服务 (AWS) 的公共云计算基础设施，MS-PyCloud 可以根据分析需求无缝扩展，实现快速高效的性能。该管道在大规模 LC-MS/MS 数据集分析中的应用证明了 MS-PyCloud 的有效性和高性能。该软件可以在 https://github.com/huizhanglab-jhu/ms-pycloud 下载。

相似文献

MS-PyCloud: A Cloud Computing-Based Pipeline for Proteomic and Glycoproteomic Data Analyses.MS-PyCloud：一种基于云计算的蛋白质组学和糖蛋白质组学数据分析管道。

Anal Chem. 2024 Jun 25;96(25):10145-10151. doi: 10.1021/acs.analchem.3c01497. Epub 2024 Jun 13.

GRAPEVNE - Graphical Analytical Pipeline Development Environment for Infectious Diseases.GRAPEVNE - 传染病图形分析管道开发环境

Wellcome Open Res. 2025 May 27;10:279. doi: 10.12688/wellcomeopenres.23824.1. eCollection 2025.

Cloud-based serverless computing enables accelerated monte carlo simulations for nuclear medicine imaging.基于云的无服务器计算可实现核医学成像的加速蒙特卡罗模拟。

Biomed Phys Eng Express. 2024 Jun 25;10(4). doi: 10.1088/2057-1976/ad5847.

Cloud-based introduction to BASH programming for biologists.基于云的生物学 BASH 编程入门。

Brief Bioinform. 2024 Jul 23;25(Supplement_1). doi: 10.1093/bib/bbae244.

Short-Term Memory Impairment短期记忆障碍

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。

Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

Integrated proteomic, phosphoproteomic, and N-glycoproteomic analyses of small extracellular vesicles from C2C12 myoblasts identify specific PTM patterns in ligand-receptor interactions.整合蛋白质组学、磷酸化蛋白质组学和 C2C12 成肌细胞小细胞外囊泡的 N-糖蛋白质组学分析鉴定了配体-受体相互作用中特定的 PTM 模式。

Cell Commun Signal. 2024 May 16;22(1):273. doi: 10.1186/s12964-024-01640-8.

The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.评估胰高血糖素样肽-1受体激动剂（GLP-1 RAs）减肥效果的网状Meta分析的数量、质量及结果：一项范围综述

Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.

VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.VDJServer：一个基于云的免疫受体序列和重排分析门户和数据公共库。

Front Immunol. 2018 May 8;9:976. doi: 10.3389/fimmu.2018.00976. eCollection 2018.

引用本文的文献

Chemotaxonomy, an Efficient Tool for Medicinal Plant Identification: Current Trends and Limitations.化学分类学：药用植物鉴定的有效工具——当前趋势与局限性

Plants (Basel). 2025 Jul 19;14(14):2234. doi: 10.3390/plants14142234.

Sonication-assisted protein extraction improves proteomic detection of membrane-bound and DNA-binding proteins from tumor tissues.超声辅助蛋白质提取可改善肿瘤组织中膜结合蛋白和DNA结合蛋白的蛋白质组学检测。

Nat Protoc. 2025 Feb 17. doi: 10.1038/s41596-024-01113-9.

Characterization of Cell Surface Glycoproteins Using Enzymatic Treatment and Mass Spectrometry.利用酶处理和质谱法对细胞表面糖蛋白进行表征

Anal Chem. 2024 Dec 3;96(48):19074-19083. doi: 10.1021/acs.analchem.4c04286. Epub 2024 Nov 18.

SPOT: spatial proteomics through on-site tissue-protein-labeling.SPOT：通过现场组织蛋白标记实现的空间蛋白质组学

Clin Proteomics. 2024 Oct 24;21(1):60. doi: 10.1186/s12014-024-09505-5.

本文引用的文献

Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite.跨蛋白质组学分析流程：基于质谱的稳健蛋白质组学数据分析套件。

J Proteome Res. 2023 Feb 3;22(2):615-624. doi: 10.1021/acs.jproteome.2c00624. Epub 2023 Jan 17.

The Crux Toolkit for Analysis of Bottom-Up Tandem Mass Spectrometry Proteomics Data.用于从头串联质谱蛋白质组学数据分析的 Crux 工具包。

J Proteome Res. 2023 Feb 3;22(2):561-569. doi: 10.1021/acs.jproteome.2c00615. Epub 2023 Jan 4.

pGlycoQuant with a deep residual network for quantitative glycoproteomics at intact glycopeptide level.基于深度残差网络的 pGlycoQuant 在完整糖肽水平上进行定量糖蛋白质组学分析

Nat Commun. 2022 Dec 7;13(1):7539. doi: 10.1038/s41467-022-35172-x.

Precise, fast and comprehensive analysis of intact glycopeptides and modified glycans with pGlyco3.使用 pGlyco3 进行精确、快速和全面的完整糖肽和修饰聚糖分析。

Nat Methods. 2021 Dec;18(12):1515-1523. doi: 10.1038/s41592-021-01306-0. Epub 2021 Nov 25.

Community evaluation of glycoproteomics informatics solutions reveals high-performance search strategies for serum glycopeptide analysis.社区对糖蛋白质组学信息学解决方案的评估揭示了用于血清糖肽分析的高性能搜索策略。

Nat Methods. 2021 Nov;18(11):1304-1316. doi: 10.1038/s41592-021-01309-x. Epub 2021 Nov 1.

Proteogenomic characterization of pancreatic ductal adenocarcinoma.胰腺导管腺癌的蛋白质基因组学特征分析。

Cell. 2021 Sep 16;184(19):5031-5052.e26. doi: 10.1016/j.cell.2021.08.023.

Serverless computing in omics data analysis and integration.无服务器计算在组学数据分析和整合中的应用。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab349.

Fast and comprehensive N- and O-glycoproteomics analysis with MSFragger-Glyco.使用 MSFragger-Glyco 进行快速全面的 N- 和 O-糖蛋白质组学分析。

Nat Methods. 2020 Nov;17(11):1125-1132. doi: 10.1038/s41592-020-0967-9. Epub 2020 Oct 5.

Integrated Proteogenomic Characterization of Clear Cell Renal Cell Carcinoma.透明细胞肾细胞癌的综合蛋白质基因组特征分析。

Cell. 2019 Oct 31;179(4):964-983.e31. doi: 10.1016/j.cell.2019.10.007.

-GlycositeAtlas: a database resource for mass spectrometry-based human N-linked glycoprotein and glycosylation site mapping.-糖基位点图谱数据库：一个基于质谱的人类N-连接糖蛋白和糖基化位点图谱绘制的数据库资源。

Clin Proteomics. 2019 Sep 7;16:35. doi: 10.1186/s12014-019-9254-0. eCollection 2019.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MS-PyCloud：一种基于云计算的蛋白质组学和糖蛋白质组学数据分析管道。

MS-PyCloud: A Cloud Computing-Based Pipeline for Proteomic and Glycoproteomic Data Analyses.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献