Suppr超能文献

用于训练和验证质谱蛋白质组学机器学习模型的多物种基准。

A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models.

机构信息

Department of Genome Sciences, University of Washington, Seattle, WA, USA.

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.

出版信息

Sci Data. 2024 Nov 8;11(1):1207. doi: 10.1038/s41597-024-04068-4.

Abstract

Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is based on a previously described benchmark but has been re-processed to ensure consistent data quality and enforce separation of training and test peptides.

摘要

训练机器学习模型用于从头测序或谱聚类等任务需要大量经过充分确认的光谱数据集。在这里,我们描述了一个包含 280 万条高置信肽谱匹配的数据集,这些匹配来自九个不同的物种。该数据集基于以前描述的基准进行构建,但已进行了重新处理,以确保数据质量的一致性,并强制分离训练肽和测试肽。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0537/11549408/a30994d5237e/41597_2024_4068_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验