Suppr超能文献

基于质谱的数千个 HeLa 对照样本的蛋白质组学数据。

Mass spectrometry-based proteomics data from thousands of HeLa control samples.

机构信息

Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N, Denmark.

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.

出版信息

Sci Data. 2024 Jan 23;11(1):112. doi: 10.1038/s41597-024-02922-z.

Abstract

Here we provide a curated, large scale, label free mass spectrometry-based proteomics data set derived from HeLa cell lines for general purpose machine learning and analysis. Data access and filtering is a tedious task, which takes up considerable amounts of time for researchers. Therefore we provide machine based metadata for easy selection and overview along the 7,444 raw files and MaxQuant search output. For convenience, we provide three filtered and aggregated development datasets on the protein groups, peptides and precursors level. Next to providing easy to access training data, we provide a SDRF file annotating each raw file with instrument settings allowing automated reprocessing. We encourage others to enlarge this data set by instrument runs of further HeLa samples from different machine types by providing our workflows and analysis scripts.

摘要

在这里,我们提供了一个经过精心策划的、大规模的、无标签的基于质谱的蛋白质组学数据集,来源于 HeLa 细胞系,可用于通用的机器学习和分析。数据访问和筛选是一项繁琐的任务,需要研究人员花费大量的时间。因此,我们提供基于机器的元数据,方便沿着 7444 个原始文件和 MaxQuant 搜索输出进行选择和概览。为了方便起见,我们在蛋白质组、肽和前体水平上提供了三个经过过滤和聚合的开发数据集。除了提供易于访问的训练数据外,我们还提供了一个 SDRF 文件,该文件使用仪器设置注释每个原始文件,允许自动重新处理。我们鼓励其他人通过提供我们的工作流程和分析脚本,使用来自不同机器类型的进一步 HeLa 样本的仪器运行来扩大这个数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5474/10806275/97398a58719c/41597_2024_2922_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验