基于质谱的数千个 HeLa 对照样本的蛋白质组学数据。

Mass spectrometry-based proteomics data from thousands of HeLa control samples.

机构信息

Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N, Denmark.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

出版信息

Sci Data. 2024 Jan 23;11(1):112. doi: 10.1038/s41597-024-02922-z.

DOI:10.1038/s41597-024-02922-z

PMID:38263211

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10806275/

Abstract

Here we provide a curated, large scale, label free mass spectrometry-based proteomics data set derived from HeLa cell lines for general purpose machine learning and analysis. Data access and filtering is a tedious task, which takes up considerable amounts of time for researchers. Therefore we provide machine based metadata for easy selection and overview along the 7,444 raw files and MaxQuant search output. For convenience, we provide three filtered and aggregated development datasets on the protein groups, peptides and precursors level. Next to providing easy to access training data, we provide a SDRF file annotating each raw file with instrument settings allowing automated reprocessing. We encourage others to enlarge this data set by instrument runs of further HeLa samples from different machine types by providing our workflows and analysis scripts.

摘要

在这里，我们提供了一个经过精心策划的、大规模的、无标签的基于质谱的蛋白质组学数据集，来源于 HeLa 细胞系，可用于通用的机器学习和分析。数据访问和筛选是一项繁琐的任务，需要研究人员花费大量的时间。因此，我们提供基于机器的元数据，方便沿着 7444 个原始文件和 MaxQuant 搜索输出进行选择和概览。为了方便起见，我们在蛋白质组、肽和前体水平上提供了三个经过过滤和聚合的开发数据集。除了提供易于访问的训练数据外，我们还提供了一个 SDRF 文件，该文件使用仪器设置注释每个原始文件，允许自动重新处理。我们鼓励其他人通过提供我们的工作流程和分析脚本，使用来自不同机器类型的进一步 HeLa 样本的仪器运行来扩大这个数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5474/10806275/97398a58719c/41597_2024_2922_Fig1_HTML.jpg

相似文献

Mass spectrometry-based proteomics data from thousands of HeLa control samples.

Sci Data. 2024 Jan 23;11(1):112. doi: 10.1038/s41597-024-02922-z.

MaxQuant and MSstats in Galaxy Enable Reproducible Cloud-Based Analysis of Quantitative Proteomics Experiments for Everyone.

J Proteome Res. 2022 Jun 3;21(6):1558-1565. doi: 10.1021/acs.jproteome.2c00051. Epub 2022 May 3.

Toward a Sample Metadata Standard in Public Proteomics Repositories.

J Proteome Res. 2020 Oct 2;19(10):3906-3909. doi: 10.1021/acs.jproteome.0c00376. Epub 2020 Sep 22.

MARMoSET - Extracting Publication-ready Mass Spectrometry Metadata from RAW Files.

Mol Cell Proteomics. 2019 Aug;18(8):1700-1702. doi: 10.1074/mcp.TIR119.001505. Epub 2019 May 16.

MassIVE.quant: a community resource of quantitative mass spectrometry-based proteomics datasets.

Nat Methods. 2020 Oct;17(10):981-984. doi: 10.1038/s41592-020-0955-0. Epub 2020 Sep 14.

Proteomics Quality Control: Quality Control Software for MaxQuant Results.

J Proteome Res. 2016 Mar 4;15(3):777-87. doi: 10.1021/acs.jproteome.5b00780. Epub 2015 Dec 28.

MaxQuant.Live Enables Global Targeting of More Than 25,000 Peptides.

Mol Cell Proteomics. 2019 May;18(5):982-994. doi: 10.1074/mcp.TIR118.001131. Epub 2019 Feb 12.

The MaxQuant computational platform for mass spectrometry-based shotgun proteomics.

Nat Protoc. 2016 Dec;11(12):2301-2319. doi: 10.1038/nprot.2016.136. Epub 2016 Oct 27.

Online Parallel Accumulation-Serial Fragmentation (PASEF) with a Novel Trapped Ion Mobility Mass Spectrometer.

Mol Cell Proteomics. 2018 Dec;17(12):2534-2545. doi: 10.1074/mcp.TIR118.000900. Epub 2018 Nov 1.

APP: an Automated Proteomics Pipeline for the analysis of mass spectrometry data based on multiple open access tools.

BMC Bioinformatics. 2014 Dec 30;15(1):441. doi: 10.1186/s12859-014-0441-8.

引用本文的文献

SysQuan: Repurposing SILAC Mice for the Cost-Effective Absolute Quantitation of the Human Proteome.

Mol Cell Proteomics. 2025 Apr 18;24(6):100974. doi: 10.1016/j.mcpro.2025.100974.

The PRIDE database at 20 years: 2025 update.

Nucleic Acids Res. 2025 Jan 6;53(D1):D543-D553. doi: 10.1093/nar/gkae1011.

Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning.

Nat Commun. 2024 Jun 26;15(1):5405. doi: 10.1038/s41467-024-48711-5.

本文引用的文献

MS-Based Proteomics of Body Fluids: The End of the Beginning.

Mol Cell Proteomics. 2023 Jul;22(7):100577. doi: 10.1016/j.mcpro.2023.100577. Epub 2023 May 19.

Toward an Integrated Machine Learning Model of a Proteomics Experiment.

J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.

Nucleic Acids Res. 2022 Jan 7;50(D1):D543-D552. doi: 10.1093/nar/gkab1038.

A proteomics sample metadata representation for multiomics integration and big data analysis.

Nat Commun. 2021 Oct 6;12(1):5854. doi: 10.1038/s41467-021-26111-3.

Sustainable data analysis with Snakemake.

F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.

An integrated landscape of protein expression in human cancer.

Sci Data. 2021 Apr 23;8(1):115. doi: 10.1038/s41597-021-00890-2.

ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion.

J Proteome Res. 2020 Jan 3;19(1):537-542. doi: 10.1021/acs.jproteome.9b00328. Epub 2019 Dec 6.

Multi-omic measurements of heterogeneity in HeLa cells across laboratories.

Nat Biotechnol. 2019 Mar;37(3):314-322. doi: 10.1038/s41587-019-0037-y. Epub 2019 Feb 18.

The MaxQuant computational platform for mass spectrometry-based shotgun proteomics.

Nat Protoc. 2016 Dec;11(12):2301-2319. doi: 10.1038/nprot.2016.136. Epub 2016 Oct 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于质谱的数千个 HeLa 对照样本的蛋白质组学数据。

Mass spectrometry-based proteomics data from thousands of HeLa control samples.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献