Flikka Kristian, Meukens Jeroen, Helsens Kenny, Vandekerckhove Joël, Eidhammer Ingvar, Gevaert Kris, Martens Lennart
Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Bergen, Norway.
Proteomics. 2007 Sep;7(18):3245-58. doi: 10.1002/pmic.200700160.
High-throughput proteomics experiments typically generate large amounts of peptide fragmentation mass spectra during a single experiment. There is often a substantial amount of redundant fragmentation of the same precursors among these spectra, which is usually considered a nuisance. We here discuss the potential of clustering and merging redundant spectra to turn this redundancy into a useful property of the dataset. To this end, we have created the first general-purpose, freely available open-source software application for clustering and merging MS/MS spectra. The application also introduces a novel approach to calculating the similarity of fragmentation mass spectra that takes into account the increased precision of modern mass spectrometers, and we suggest a simple but effective improvement to single-linkage clustering. The application and the novel algorithms are applied to several real-life proteomic datasets and the results are discussed. An analysis of the influence of the different algorithms available and their parameters is given, as well as a number of important applications of the overall approach.
高通量蛋白质组学实验通常在单次实验中产生大量肽段碎裂质谱图。在这些质谱图中,同一前体往往存在大量冗余碎裂,这通常被视为一种麻烦。我们在此讨论对冗余质谱图进行聚类和合并的潜力,以便将这种冗余转化为数据集的一个有用特性。为此,我们创建了首个用于聚类和合并串联质谱(MS/MS)谱图的通用、免费开源软件应用程序。该应用程序还引入了一种计算碎裂质谱图相似度的新方法,该方法考虑了现代质谱仪提高的精度,并且我们提出了对单链聚类的一个简单而有效的改进。该应用程序和新算法被应用于多个实际蛋白质组学数据集,并对结果进行了讨论。给出了对可用的不同算法及其参数影响的分析,以及该整体方法的一些重要应用。