Perez-Riverol Yasset
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K.
J Proteome Res. 2020 Oct 2;19(10):3906-3909. doi: 10.1021/acs.jproteome.0c00376. Epub 2020 Sep 22.
Metadata is essential in proteomics data repositories and is crucial to interpret and reanalyze the deposited data sets. For every proteomics data set, we should capture at least three levels of metadata: (i) data set description, (ii) the sample to data files related information, and (iii) standard data file formats (e.g., mzIdentML, mzML, or mzTab). While the data set description and standard data file formats are supported by all ProteomeXchange partners, the information regarding the sample to data files is mostly missing. Recently, members of the European Bioinformatics Community for Mass Spectrometry (EuBIC) have created an open-source project called Sample to Data file format for Proteomics (https://github.com/bigbio/proteomics-metadata-standard/) to enable the standardization of sample metadata of public proteomics data sets. Here, the project is presented to the proteomics community, and we call for contributors, including researchers, journals, and consortiums to provide feedback about the format. We believe this work will improve reproducibility and facilitate the development of new tools dedicated to proteomics data analysis.
元数据在蛋白质组学数据存储库中至关重要,对于解释和重新分析所存储的数据集也至关重要。对于每个蛋白质组学数据集,我们应至少获取三个层次的元数据:(i)数据集描述,(ii)样本与数据文件相关信息,以及(iii)标准数据文件格式(例如,mzIdentML、mzML或mzTab)。虽然所有蛋白质组交换合作伙伴都支持数据集描述和标准数据文件格式,但样本与数据文件相关的信息大多缺失。最近,欧洲质谱生物信息学社区(EuBIC)的成员创建了一个名为蛋白质组学样本到数据文件格式的开源项目(https://github.com/bigbio/proteomics-metadata-standard/),以实现公共蛋白质组学数据集样本元数据的标准化。在此,将该项目介绍给蛋白质组学社区,我们呼吁包括研究人员、期刊和联盟在内的贡献者就该格式提供反馈。我们相信这项工作将提高可重复性,并促进致力于蛋白质组学数据分析的新工具的开发。