迈向公共蛋白质组学数据库中的样本元数据标准。

Toward a Sample Metadata Standard in Public Proteomics Repositories.

作者信息

Perez-Riverol Yasset

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K.

出版信息

J Proteome Res. 2020 Oct 2;19(10):3906-3909. doi: 10.1021/acs.jproteome.0c00376. Epub 2020 Sep 22.

DOI:10.1021/acs.jproteome.0c00376

PMID:32786688

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7116434/

Abstract

Metadata is essential in proteomics data repositories and is crucial to interpret and reanalyze the deposited data sets. For every proteomics data set, we should capture at least three levels of metadata: (i) data set description, (ii) the sample to data files related information, and (iii) standard data file formats (e.g., mzIdentML, mzML, or mzTab). While the data set description and standard data file formats are supported by all ProteomeXchange partners, the information regarding the sample to data files is mostly missing. Recently, members of the European Bioinformatics Community for Mass Spectrometry (EuBIC) have created an open-source project called Sample to Data file format for Proteomics (https://github.com/bigbio/proteomics-metadata-standard/) to enable the standardization of sample metadata of public proteomics data sets. Here, the project is presented to the proteomics community, and we call for contributors, including researchers, journals, and consortiums to provide feedback about the format. We believe this work will improve reproducibility and facilitate the development of new tools dedicated to proteomics data analysis.

摘要

元数据在蛋白质组学数据存储库中至关重要，对于解释和重新分析所存储的数据集也至关重要。对于每个蛋白质组学数据集，我们应至少获取三个层次的元数据：（i）数据集描述，（ii）样本与数据文件相关信息，以及（iii）标准数据文件格式（例如，mzIdentML、mzML或mzTab）。虽然所有蛋白质组交换合作伙伴都支持数据集描述和标准数据文件格式，但样本与数据文件相关的信息大多缺失。最近，欧洲质谱生物信息学社区（EuBIC）的成员创建了一个名为蛋白质组学样本到数据文件格式的开源项目（https://github.com/bigbio/proteomics-metadata-standard/），以实现公共蛋白质组学数据集样本元数据的标准化。在此，将该项目介绍给蛋白质组学社区，我们呼吁包括研究人员、期刊和联盟在内的贡献者就该格式提供反馈。我们相信这项工作将提高可重复性，并促进致力于蛋白质组学数据分析的新工具的开发。

相似文献

Toward a Sample Metadata Standard in Public Proteomics Repositories.

J Proteome Res. 2020 Oct 2;19(10):3906-3909. doi: 10.1021/acs.jproteome.0c00376. Epub 2020 Sep 22.

PRIDE Inspector Toolsuite: Moving Toward a Universal Visualization Tool for Proteomics Data Standard Formats and Quality Assessment of ProteomeXchange Datasets.

Mol Cell Proteomics. 2016 Jan;15(1):305-17. doi: 10.1074/mcp.O115.050229. Epub 2015 Nov 6.

ppx: Programmatic Access to Proteomics Data Repositories.

J Proteome Res. 2021 Sep 3;20(9):4621-4624. doi: 10.1021/acs.jproteome.1c00454. Epub 2021 Aug 3.

Tissue proteomics repositories for data reanalysis.

Mass Spectrom Rev. 2024 Nov-Dec;43(6):1270-1284. doi: 10.1002/mas.21860. Epub 2023 Aug 3.

The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience.

Mol Cell Proteomics. 2014 Oct;13(10):2765-75. doi: 10.1074/mcp.O113.036681. Epub 2014 Jun 30.

Ibaqpy: A scalable Python package for baseline quantification in proteomics leveraging SDRF metadata.

J Proteomics. 2025 Jun 15;317:105440. doi: 10.1016/j.jprot.2025.105440. Epub 2025 Apr 21.

A proteomics sample metadata representation for multiomics integration and big data analysis.

Nat Commun. 2021 Oct 6;12(1):5854. doi: 10.1038/s41467-021-26111-3.

jmzTab: a java interface to the mzTab data standard.

Proteomics. 2014 Jun;14(11):1328-32. doi: 10.1002/pmic.201300560. Epub 2014 Apr 29.

ms-data-core-api: an open-source, metadata-oriented library for computational proteomics.

Bioinformatics. 2015 Sep 1;31(17):2903-5. doi: 10.1093/bioinformatics/btv250. Epub 2015 Apr 24.

The mzIdentML data standard for mass spectrometry-based proteomics results.

Mol Cell Proteomics. 2012 Jul;11(7):M111.014381. doi: 10.1074/mcp.M111.014381. Epub 2012 Feb 27.

引用本文的文献

The Future of a Myriad of Accelerated Biodiscoveries Lies in AI-Powered Mass Spectrometry and Multiomics Integration.

J Mass Spectrom. 2025 Aug;60(8):e5157. doi: 10.1002/jms.5157.

What is the real value of omics data? Enhancing research outcomes and securing long-term data excellence.

Nucleic Acids Res. 2024 Nov 11;52(20):12130-12140. doi: 10.1093/nar/gkae901.

Ten simple rules for starting FAIR discussions in your community.

PLoS Comput Biol. 2023 Dec 14;19(12):e1011668. doi: 10.1371/journal.pcbi.1011668. eCollection 2023 Dec.

SMetaS: A Sample Metadata Standardizer for Metabolomics.

Metabolites. 2023 Aug 12;13(8):941. doi: 10.3390/metabo13080941.

Meta-analysis of published cerebrospinal fluid proteomics data identifies and validates metabolic enzyme panel as Alzheimer's disease biomarkers.

Cell Rep Med. 2023 Apr 18;4(4):101005. doi: 10.1016/j.xcrm.2023.101005.

Implementing the reuse of public DIA proteomics datasets: from the PRIDE database to Expression Atlas.

Sci Data. 2022 Jun 14;9(1):335. doi: 10.1038/s41597-022-01380-9.

A knowledge graph to interpret clinical proteomics data.

Nat Biotechnol. 2022 May;40(5):692-702. doi: 10.1038/s41587-021-01145-6. Epub 2022 Jan 31.

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.

Nucleic Acids Res. 2022 Jan 7;50(D1):D543-D552. doi: 10.1093/nar/gkab1038.

A proteomics sample metadata representation for multiomics integration and big data analysis.

Nat Commun. 2021 Oct 6;12(1):5854. doi: 10.1038/s41467-021-26111-3.

Sharing biological data: why, when, and how.

FEBS Lett. 2021 Apr;595(7):847-863. doi: 10.1002/1873-3468.14067.

本文引用的文献

Quantitative Proteomics of the Cancer Cell Line Encyclopedia.

Cell. 2020 Jan 23;180(2):387-402.e16. doi: 10.1016/j.cell.2019.12.023.

ProteomicsDB: a multi-omics and multi-organism resource for life science research.

Nucleic Acids Res. 2020 Jan 8;48(D1):D1153-D1163. doi: 10.1093/nar/gkz974.

The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Nucleic Acids Res. 2019 Jan 8;47(D1):D442-D450. doi: 10.1093/nar/gky1106.

Proteomics Standards Initiative: Fifteen Years of Progress and Future Work.

J Proteome Res. 2017 Dec 1;16(12):4288-4298. doi: 10.1021/acs.jproteome.7b00370. Epub 2017 Sep 15.

Experimental design and data-analysis in label-free quantitative LC/MS proteomics: A tutorial with MSqRob.

J Proteomics. 2018 Jan 16;171:23-36. doi: 10.1016/j.jprot.2017.04.004. Epub 2017 Apr 5.

A large dataset of protein dynamics in the mammalian heart proteome.

Sci Data. 2016 Mar 15;3:160015. doi: 10.1038/sdata.2016.15.

linkedISA: semantic representation of ISA-Tab experimental metadata.

BMC Bioinformatics. 2014;15 Suppl 14(Suppl 14):S4. doi: 10.1186/1471-2105-15-S14-S4. Epub 2014 Nov 27.

Identifying novel biomarkers through data mining-a realistic scenario?

Proteomics Clin Appl. 2015 Apr;9(3-4):437-43. doi: 10.1002/prca.201400107. Epub 2015 Jan 12.

Modeling experimental design for proteomics.

Methods Mol Biol. 2010;673:223-30. doi: 10.1007/978-1-60761-842-3_14.

A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB.

BMC Bioinformatics. 2006 Nov 6;7:489. doi: 10.1186/1471-2105-7-489.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

迈向公共蛋白质组学数据库中的样本元数据标准。

Toward a Sample Metadata Standard in Public Proteomics Repositories.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献