Skaggs School of Pharmacy and Pharmaceutical Science, University of California San Diego, La Jolla, CA, USA.
Department of Genome Sciences, University of Washington, Seattle, WA, USA.
Nat Methods. 2022 Jun;19(6):675-678. doi: 10.1038/s41592-022-01496-1. Epub 2022 May 30.
Computational methods that aim to exploit publicly available mass spectrometry repositories rely primarily on unsupervised clustering of spectra. Here we trained a deep neural network in a supervised fashion on the basis of previous assignments of peptides to spectra. The network, called 'GLEAMS', learns to embed spectra in a low-dimensional space in which spectra generated by the same peptide are close to one another. We applied GLEAMS for large-scale spectrum clustering, detecting groups of unidentified, proximal spectra representing the same peptide. We used these clusters to explore the dark proteome of repeatedly observed yet consistently unidentified mass spectra.
旨在利用公共质谱数据库的计算方法主要依赖于对光谱的无监督聚类。在这里,我们在先前将肽分配给光谱的基础上,以监督方式训练深度神经网络。该网络称为“GLEAMS”,它学习将光谱嵌入到低维空间中,在该空间中,由相同肽生成的光谱彼此靠近。我们应用 GLEAMS 进行大规模光谱聚类,检测代表相同肽的未识别的相邻光谱的组。我们使用这些聚类来探索反复观察但始终未识别的质谱的暗蛋白质组。