Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, 400065 Chongqing, China.
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States.
J Proteome Res. 2022 Jun 3;21(6):1566-1574. doi: 10.1021/acs.jproteome.2c00069. Epub 2022 May 13.
Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.
光谱聚类是一种强大的策略,可以通过基于相似性对它们进行分组来最小化冗余的质谱,目的是形成来自同一反复测量分析物的质谱组。每个这样的近同谱组都可以通过其所谓的共识谱来表示,以便进行下游处理。尽管已经充分基准测试和测试了几种用于光谱聚类的算法,但很少评估共识谱生成步骤的影响。在这里,我们提出了常见共识谱算法的实现和基准测试,包括光谱平均、光谱-bin 化、最相似谱和最佳鉴定谱。我们使用两种不同的聚类算法(spectra-cluster 和 MaRaCluster)分析了多样化的公共数据集,以评估共识谱生成过程如何影响下游肽鉴定。BEST 和 BIN 方法被发现是生成共识谱最可靠的方法,包括对具有翻译后修饰(PTM)如磷酸化的数据集。本研究的所有源代码和数据都可以在 GitHub 上免费获得,网址为 https://github.com/statisticalbiotechnology/representative-spectra-benchmark。