iProX 在 2021 年：将蛋白质组学数据共享与大数据连接起来。

iProX in 2021: connecting proteomics data sharing with big data.

机构信息

State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.

School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 26469, China.

出版信息

Nucleic Acids Res. 2022 Jan 7;50(D1):D1522-D1527. doi: 10.1093/nar/gkab1081.

DOI:10.1093/nar/gkab1081

PMID:34871441

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8728291/

Abstract

The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics.

摘要

蛋白质组学研究的快速发展产生了大量的实验数据。大数据平台的出现为处理这些大量数据提供了机会。综合蛋白质组资源 iProX（https://www.iprox.cn）于 2017 年启动，在 2021 年实施了最新的大数据平台后，得到了极大的改善。在这里，我们描述了自 2019 年在《核酸研究》上首次发表以来 iProX 的主要进展。首先，具有高可扩展性的超融合架构支持提交过程。Hadoop 集群可以存储大量蛋白质组学数据集，分布式的、基于 RESTful 风格的 Elastic Search 引擎可以在一秒钟内查询数百万条记录。此外，还为 iProX 添加了几个新功能，包括 ProteomeXchange 提出的通用谱标识符 (USI) 机制、RESTful Web Service API 和高效再分析管道，以实现更好的开放数据共享。截至 2021 年 8 月底，已有 1526 个数据集提交到 iProX，总数据量达到 92.42TB。通过实施大数据平台，iProX 可以支持 PB 级别的数据存储、数十亿个光谱记录和二级延迟服务能力，满足蛋白质组学快速发展领域的要求。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ebd/8728291/7bf44f016ffb/gkab1081fig1.jpg

相似文献

iProX in 2021: connecting proteomics data sharing with big data.

Nucleic Acids Res. 2022 Jan 7;50(D1):D1522-D1527. doi: 10.1093/nar/gkab1081.

iProX: an integrated proteome resource.

Nucleic Acids Res. 2019 Jan 8;47(D1):D1211-D1217. doi: 10.1093/nar/gky869.

The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics.

Nucleic Acids Res. 2020 Jan 8;48(D1):D1145-D1152. doi: 10.1093/nar/gkz984.

The ProteomeXchange consortium at 10 years: 2023 update.

Nucleic Acids Res. 2023 Jan 6;51(D1):D1539-D1548. doi: 10.1093/nar/gkac1040.

Proteomic repository data submission, dissemination, and reuse: key messages.

Expert Rev Proteomics. 2022 Jul-Dec;19(7-12):297-310. doi: 10.1080/14789450.2022.2160324. Epub 2022 Dec 26.

quantms: a cloud-based pipeline for quantitative proteomics enables the reanalysis of public proteomics data.

Nat Methods. 2024 Sep;21(9):1603-1607. doi: 10.1038/s41592-024-02343-1. Epub 2024 Jul 4.

The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition.

Nucleic Acids Res. 2017 Jan 4;45(D1):D1100-D1106. doi: 10.1093/nar/gkw936. Epub 2016 Oct 18.

The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Nucleic Acids Res. 2019 Jan 8;47(D1):D442-D450. doi: 10.1093/nar/gky1106.

jPOSTrepo: an international standard data repository for proteomes.

Nucleic Acids Res. 2017 Jan 4;45(D1):D1107-D1111. doi: 10.1093/nar/gkw1080. Epub 2016 Nov 28.

A platform to standardize, store, and visualize proteomics experimental data.

Acta Biochim Biophys Sin (Shanghai). 2009 Apr;41(4):273-9. doi: 10.1093/abbs/gmp010.

引用本文的文献

Cardiomyocyte mitochondrial mono-ADP-ribosylation dictates cardiac tolerance to sepsis by configuring bioenergetic reserve in male mice.

Nat Commun. 2025 Aug 30;16(1):8119. doi: 10.1038/s41467-025-62384-8.

DCAF7 recruits USP2 to facilitate hepatocellular carcinoma progression by suppressing clockophagy-induced ferroptosis.

Cell Death Dis. 2025 Aug 28;16(1):654. doi: 10.1038/s41419-025-07977-3.

Comparative Multi-Omics Analysis and Antitumor Activity of and .

J Microbiol Biotechnol. 2025 Aug 28;35:e2504029. doi: 10.4014/jmb.2504.04029.

Chromatin Remodeler RSF1 as an Oncogenic Driver and Therapeutic Target in Esophageal Squamous Cell Carcinoma.

Cells. 2025 Aug 15;14(16):1262. doi: 10.3390/cells14161262.

Crotonylation of IDH1 alleviates MASLD progression by enhancing the TCA cycle.

Nat Commun. 2025 Aug 26;16(1):7961. doi: 10.1038/s41467-025-62731-9.

Multi-omics analysis reveals RNA polymerase II degradation as a novel mechanism of PF-3758309's anti-tumor activity.

Cell Death Discov. 2025 Aug 25;11(1):404. doi: 10.1038/s41420-025-02677-5.

H4K79 and H4K91 histone lactylation, newly identified lactylation sites enriched in breast cancer.

J Exp Clin Cancer Res. 2025 Aug 23;44(1):252. doi: 10.1186/s13046-025-03512-6.

Integrative proteogenomic characterization of Wilms tumor.

Nat Commun. 2025 Aug 19;16(1):7715. doi: 10.1038/s41467-025-62234-7.

Speedy A governs non-homologous XY chromosome desynapsis as a unique prerequisite for XY loop-axis organization.

EMBO J. 2025 Aug 18. doi: 10.1038/s44318-025-00528-8.

The pathogenicity and multi-organ proteomic profiles of Mpox virus infection in SIVmac239-infected rhesus macaques.

Nat Commun. 2025 Aug 17;16(1):7653. doi: 10.1038/s41467-025-62919-z.

本文引用的文献

Universal Spectrum Identifier for mass spectra.

Nat Methods. 2021 Jul;18(7):768-770. doi: 10.1038/s41592-021-01184-6. Epub 2021 Jun 28.

Universal Spectrum Explorer: A Standalone (Web-)Application for Cross-Resource Spectrum Comparison.

J Proteome Res. 2021 Jun 4;20(6):3388-3394. doi: 10.1021/acs.jproteome.1c00096. Epub 2021 May 10.

Data Management of Sensitive Human Proteomics Data: Current Practices, Recommendations, and Perspectives for the Future.

Mol Cell Proteomics. 2021;20:100071. doi: 10.1016/j.mcpro.2021.100071. Epub 2021 Mar 10.

Ethical Principles, Constraints and Opportunities in Clinical Proteomics.

Mol Cell Proteomics. 2021 Jan 14;20:100046. doi: 10.1016/j.mcpro.2021.100046.

The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics.

Nucleic Acids Res. 2020 Jan 8;48(D1):D1145-D1152. doi: 10.1093/nar/gkz984.

Enabling Massive XML-Based Biological Data Management in HBase.

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1994-2004. doi: 10.1109/TCBB.2019.2915811. Epub 2020 Dec 8.

The challenges of big data biology.

Elife. 2019 Apr 5;8:e47381. doi: 10.7554/eLife.47381.

Proteomics identifies new therapeutic targets of early-stage hepatocellular carcinoma.

Nature. 2019 Mar;567(7747):257-261. doi: 10.1038/s41586-019-0987-8. Epub 2019 Feb 27.

The application of Hadoop in structural bioinformatics.

Brief Bioinform. 2020 Jan 17;21(1):96-105. doi: 10.1093/bib/bby106.

The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Nucleic Acids Res. 2019 Jan 8;47(D1):D442-D450. doi: 10.1093/nar/gky1106.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

iProX 在 2021 年：将蛋白质组学数据共享与大数据连接起来。

iProX in 2021: connecting proteomics data sharing with big data.

机构信息

State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China.

School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 26469, China.

出版信息

Nucleic Acids Res. 2022 Jan 7;50(D1):D1522-D1527. doi: 10.1093/nar/gkab1081.

DOI:10.1093/nar/gkab1081

PMID:34871441

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8728291/

Abstract

摘要

iProX 在 2021 年：将蛋白质组学数据共享与大数据连接起来。

iProX in 2021: connecting proteomics data sharing with big data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

iProX 在 2021 年：将蛋白质组学数据共享与大数据连接起来。

iProX in 2021: connecting proteomics data sharing with big data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献