Suppr超能文献

MS-PyCloud:一种基于云计算的蛋白质组学和糖蛋白质组学数据分析管道。

MS-PyCloud: A Cloud Computing-Based Pipeline for Proteomic and Glycoproteomic Data Analyses.

机构信息

Department of Pathology, School of Medicine, Johns Hopkins University, Baltimore, Maryland 21231, United States.

出版信息

Anal Chem. 2024 Jun 25;96(25):10145-10151. doi: 10.1021/acs.analchem.3c01497. Epub 2024 Jun 13.

Abstract

Rapid development and wide adoption of mass spectrometry-based glycoproteomic technologies have empowered scientists to study proteins and protein glycosylation in complex samples on a large scale. This progress has also created unprecedented challenges for individual laboratories to store, manage, and analyze proteomic and glycoproteomic data, both in the cost for proprietary software and high-performance computing and in the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI), for proteomic and glycoproteomic data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignments to peptide sequences, false discovery rate estimation, protein inference, quantitation of global protein levels, and specific glycan-modified glycopeptides as well as other modification-specific peptides such as phosphorylation, acetylation, and ubiquitination. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open-source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at https://github.com/huizhanglab-jhu/ms-pycloud.

摘要

基于质谱的糖蛋白质组学技术的快速发展和广泛应用,使科学家能够大规模研究复杂样本中的蛋白质和蛋白质糖基化。这一进展也给各个实验室带来了前所未有的挑战,需要存储、管理和分析蛋白质组学和糖蛋白质组学数据,包括专有软件和高性能计算的成本,以及在探索性和发现性分析中,数据处理设置的实时更改会导致处理时间延长。我们开发了一个基于开源和云计算的 MS-PyCloud 分析管道,带有图形用户界面 (GUI),用于蛋白质组学和糖蛋白质组学数据分析。该管道的主要组件包括数据文件完整性验证、用于将光谱分配给肽序列的 MS/MS 数据库搜索、错误发现率估计、蛋白质推断、全局蛋白质水平的定量以及特定糖基化修饰的糖肽以及其他修饰特异性肽,如磷酸化、乙酰化和泛素化。为了确保数据分析的透明度和可重复性,MS-PyCloud 包括用于光谱分配的具有全面测试和版本控制的开源软件工具。通过利用亚马逊网络服务 (AWS) 的公共云计算基础设施,MS-PyCloud 可以根据分析需求无缝扩展,实现快速高效的性能。该管道在大规模 LC-MS/MS 数据集分析中的应用证明了 MS-PyCloud 的有效性和高性能。该软件可以在 https://github.com/huizhanglab-jhu/ms-pycloud 下载。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验