Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology , Medical University of Vienna , Währinger Gürtel 18-20 , Vienna 1090 , Austria.
European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , United Kingdom.
J Proteome Res. 2018 May 4;17(5):1993-1996. doi: 10.1021/acs.jproteome.7b00824. Epub 2018 Apr 25.
In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
在最近题为“串联质谱峰聚类算法的比较与评估”的基准文章中,Rieder 等人比较了几种不同的方法来对 MS/MS 谱进行聚类。虽然我们确实认可该手稿的价值,但在这里,我们报告了在原始分析中发现的一些缺陷。对于大多数分析,作者仅对单个 MS/MS 运行进行聚类。在报告的分析之一中,三个 MS/MS 运行一起进行处理,这已经导致许多测试方法的计算性能出现问题。这一事实突出了在当今产生的平均蛋白质组学数据集上使用许多测试算法的困难。其次,作者仅在合并 MS 运行时处理已识别的光谱。由此,所有质量较低的未识别光谱已从数据集删除,并且不会影响聚类结果。接下来,我们发现作者没有分析嵌合体光谱对聚类结果的影响。在我们的分析中,我们发现所使用数据集中的 3%的光谱是嵌合体,这对测试的不同聚类算法的行为有显著影响。最后,作者选择仅使用 5 Da 的前体容忍度评估 MS-Cluster 和 spectra-cluster 算法对高分辨率 Orbitrap 数据的性能,我们认为这不足以评估 MS/MS 聚类方法的性能。