• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于质谱相似性搜索的深度学习嵌入器方法和工具。

Deep learning embedder method and tool for mass spectra similarity search.

作者信息

Qin Chunyuan, Luo Xiyang, Deng Chuan, Shu Kunxian, Zhu Weimin, Griss Johannes, Hermjakob Henning, Bai Mingze, Perez-Riverol Yasset

机构信息

Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China.

State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China.

出版信息

J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.

DOI:10.1016/j.jprot.2020.104070
PMID:33307250
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7613299/
Abstract

Spectral similarity calculation is widely used in protein identification tools and mass spectra clustering algorithms while comparing theoretical or experimental spectra. The performance of the spectral similarity calculation plays an important role in these tools and algorithms especially in the analysis of large-scale datasets. Recently, deep learning methods have been proposed to improve the performance of clustering algorithms and protein identification by training the algorithms with existing data and the use of multiple spectra and identified peptide features. While the efficiency of these algorithms is still under study in comparison with traditional approaches, their application in proteomics data analysis is becoming more common. Here, we propose the use of deep learning to improve spectral similarity comparison. We assessed the performance of deep learning for spectral similarity, with GLEAMS and a newly trained embedder model (DLEAMSE), which uses high-quality spectra from PRIDE Cluster. Also, we developed a new bioinformatics tool (mslookup - https://github.com/bigbio/DLEAMSE/) that allows users to quickly search for spectra in previously identified mass spectra publish in public repositories and spectral libraries. Finally, we released a human database to enable bioinformaticians and biologists to search for identified spectra in their machines. SIGNIFICANCE STATEMENT: Spectral similarity calculation plays an important role in proteomics data analysis. With deep learning's ability to learn the implicit and effective features from large-scale training datasets, deep learning-based MS/MS spectra embedding models has emerged as a solution to improve mass spectral clustering similarity calculation algorithms. We compare multiple similarity scoring and deep learning methods in terms of accuracy (compute the similarity for a pair of the mass spectrum) and computing-time performance. The benchmark results showed no major differences in accuracy between DLEAMSE and normalized dot product for spectrum similarity calculations. The DLEAMSE GPU implementation is faster than NDP in preprocessing on the GPU server and the similarity calculation of DLEAMSE (Euclidean distance on 32-D vectors) takes about 1/3 of dot product calculations. The deep learning model (DLEAMSE) encoding and embedding steps needed to run once for each spectrum and the embedded 32-D points can be persisted in the repository for future comparison, which is faster for future comparisons and large-scale data. Based on these, we proposed a new tool mslookup that enables the researcher to find spectra previously identified in public data. The tool can be also used to generate in-house databases of previously identified spectra to share with other laboratories and consortiums.

摘要

在比较理论光谱或实验光谱时,光谱相似性计算在蛋白质鉴定工具和质谱聚类算法中被广泛应用。光谱相似性计算的性能在这些工具和算法中起着重要作用,尤其是在大规模数据集的分析中。最近,通过利用现有数据以及多光谱和已鉴定肽段特征对算法进行训练,人们提出了深度学习方法来提高聚类算法和蛋白质鉴定的性能。虽然与传统方法相比,这些算法的效率仍在研究之中,但它们在蛋白质组学数据分析中的应用正变得越来越普遍。在此,我们提出利用深度学习来改进光谱相似性比较。我们使用GLEAMS和一个新训练的嵌入模型(DLEAMSE)评估了深度学习在光谱相似性方面的性能,DLEAMSE使用来自PRIDE Cluster的高质量光谱。此外,我们开发了一个新的生物信息学工具(mslookup - https://github.com/bigbio/DLEAMSE/),该工具允许用户在公共存储库和光谱库中快速搜索先前鉴定的质谱中的光谱。最后,我们发布了一个人类数据库,以使生物信息学家和生物学家能够在他们的机器中搜索已鉴定的光谱。

重要声明

光谱相似性计算在蛋白质组学数据分析中起着重要作用。基于深度学习能够从大规模训练数据集中学习隐含且有效的特征,基于深度学习的MS/MS光谱嵌入模型已成为一种改进质谱聚类相似性计算算法的解决方案。我们在准确性(计算一对质谱的相似性)和计算时间性能方面比较了多种相似性评分和深度学习方法。基准测试结果表明,在光谱相似性计算方面,DLEAMSE和归一化点积在准确性上没有重大差异。在GPU服务器上进行预处理时,DLEAMSE的GPU实现比NDP更快,并且DLEAMSE的相似性计算(基于32维向量的欧几里得距离)大约需要点积计算时间的1/3。深度学习模型(DLEAMSE)的编码和嵌入步骤对于每个光谱只需运行一次,并且嵌入的32维点可以保存在存储库中以供未来比较,这对于未来的比较和大规模数据来说更快。基于这些,我们提出了一个新工具mslookup,使研究人员能够找到先前在公共数据中鉴定的光谱。该工具还可用于生成先前鉴定光谱的内部数据库,以便与其他实验室和联盟共享。

相似文献

1
Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。
J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.
2
ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics.ClusterSheep:一种用于从 shotgun 蛋白质组学中大规模聚类串联质谱的图形处理单元加速软件工具。
J Proteome Res. 2021 Dec 3;20(12):5359-5367. doi: 10.1021/acs.jproteome.1c00485. Epub 2021 Nov 4.
3
A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics.蛋白质组学中共识谱生成方法的综合评价
J Proteome Res. 2022 Jun 3;21(6):1566-1574. doi: 10.1021/acs.jproteome.2c00069. Epub 2022 May 13.
4
msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing.msCRUSH:基于局部敏感哈希的快速串联质谱聚类。
J Proteome Res. 2019 Jan 4;18(1):147-158. doi: 10.1021/acs.jproteome.8b00448. Epub 2018 Dec 14.
5
NovoRank: Refinement for Peptide Sequencing Based on Spectral Clustering and Deep Learning.NovoRank:基于谱聚类和深度学习的肽段测序优化方法
J Proteome Res. 2025 Feb 7;24(2):903-910. doi: 10.1021/acs.jproteome.4c00300. Epub 2024 Dec 31.
6
Tempest: GPU-CPU computing for high-throughput database spectral matching.Tempest:用于高通量数据库光谱匹配的 GPU-CPU 计算。
J Proteome Res. 2012 Jul 6;11(7):3581-91. doi: 10.1021/pr300338p. Epub 2012 Jun 8.
7
SpecEncoder: deep metric learning for accurate peptide identification in proteomics.SpecEncoder:用于蛋白质组学中精确肽段鉴定的深度度量学习。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i257-i265. doi: 10.1093/bioinformatics/btae220.
8
Enhanced peptide quantification using spectral count clustering and cluster abundance.使用谱计数聚类和聚类丰度进行增强的肽定量。
BMC Bioinformatics. 2011 Oct 28;12:423. doi: 10.1186/1471-2105-12-423.
9
In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。
J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.
10
DeepRescore: Leveraging Deep Learning to Improve Peptide Identification in Immunopeptidomics.DeepRescore:利用深度学习提高免疫肽组学中的肽鉴定。
Proteomics. 2020 Nov;20(21-22):e1900334. doi: 10.1002/pmic.201900334. Epub 2020 Sep 27.

引用本文的文献

1
SpecEncoder: deep metric learning for accurate peptide identification in proteomics.SpecEncoder:用于蛋白质组学中精确肽段鉴定的深度度量学习。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i257-i265. doi: 10.1093/bioinformatics/btae220.
2
Proteomic repository data submission, dissemination, and reuse: key messages.蛋白质组学知识库数据提交、发布和再利用:关键信息。
Expert Rev Proteomics. 2022 Jul-Dec;19(7-12):297-310. doi: 10.1080/14789450.2022.2160324. Epub 2022 Dec 26.
3
Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools.

本文引用的文献

1
A learned embedding for efficient joint analysis of millions of mass spectra.一种用于高效联合分析数百万个质谱的深度学习嵌入方法。
Nat Methods. 2022 Jun;19(6):675-678. doi: 10.1038/s41592-022-01496-1. Epub 2022 May 30.
2
The ProteomeXchange consortium in 2020: enabling 'big data' approaches in proteomics.2020 年蛋白质组交换联盟:在蛋白质组学中启用“大数据”方法。
Nucleic Acids Res. 2020 Jan 8;48(D1):D1145-D1152. doi: 10.1093/nar/gkz984.
3
MS/MS Spectrum Prediction for Modified Peptides Using pDeep2 Trained by Transfer Learning.使用基于迁移学习训练的 pDeep2 对修饰肽进行 MS/MS 谱预测。
使用和基准测试计算代谢组学生物标志物注释工具的良好实践和建议。
Metabolomics. 2022 Dec 5;18(12):103. doi: 10.1007/s11306-022-01963-y.
4
Memory-Efficient Searching of Gas-Chromatography Mass Spectra Accelerated by Prescreening.通过预筛选加速气相色谱质谱的内存高效搜索
Metabolites. 2022 May 29;12(6):491. doi: 10.3390/metabo12060491.
5
The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.PRIDE 数据库资源在 2022 年:一个基于质谱的蛋白质组学证据的中心。
Nucleic Acids Res. 2022 Jan 7;50(D1):D543-D552. doi: 10.1093/nar/gkab1038.
6
SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions.SpeCollate:基于深度跨模态相似性网络的质谱数据肽推断。
PLoS One. 2021 Oct 29;16(10):e0259349. doi: 10.1371/journal.pone.0259349. eCollection 2021.
7
The language of proteins: NLP, machine learning & protein sequences.蛋白质的语言:自然语言处理、机器学习与蛋白质序列
Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.
Anal Chem. 2019 Aug 6;91(15):9724-9731. doi: 10.1021/acs.analchem.9b01262. Epub 2019 Jul 8.
4
Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning.Prosit:基于深度学习的肽串联质谱的蛋白质组范围预测。
Nat Methods. 2019 Jun;16(6):509-518. doi: 10.1038/s41592-019-0426-7. Epub 2019 May 27.
5
[Progress in the spectral library based protein identification strategy].[基于光谱库的蛋白质鉴定策略研究进展]
Sheng Wu Gong Cheng Xue Bao. 2018 Apr 25;34(4):525-536. doi: 10.13345/j.cjb.170321.
6
Application of targeted mass spectrometry in bottom-up proteomics for systems biology research.靶向质谱在系统生物学研究中的蛋白质组学中的应用。
J Proteomics. 2018 Oct 30;189:75-90. doi: 10.1016/j.jprot.2018.02.008. Epub 2018 Feb 13.
7
pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning.pDeep:基于深度学习的肽段 MS/MS 谱预测。
Anal Chem. 2017 Dec 5;89(23):12690-12697. doi: 10.1021/acs.analchem.7b02566. Epub 2017 Nov 21.
8
De novo peptide sequencing by deep learning.通过深度学习进行从头肽测序。
Proc Natl Acad Sci U S A. 2017 Aug 1;114(31):8247-8252. doi: 10.1073/pnas.1705691114. Epub 2017 Jul 18.
9
The Hybrid Search: A Mass Spectral Library Search Method for Discovery of Modifications in Proteomics.混合搜索:一种用于蛋白质组学修饰发现的质谱图库搜索方法。
J Proteome Res. 2017 May 5;16(5):1924-1935. doi: 10.1021/acs.jproteome.6b00988. Epub 2017 Apr 11.
10
Epsilon-Q: An Automated Analyzer Interface for Mass Spectral Library Search and Label-Free Protein Quantification.Epsilon-Q:用于质谱文库搜索和无标记蛋白质定量的自动化分析仪接口。
J Proteome Res. 2017 Dec 1;16(12):4435-4445. doi: 10.1021/acs.jproteome.6b01019. Epub 2017 Apr 4.