一种用于串联质谱数据的通用聚类工具的实现与应用

Implementation and application of a versatile clustering tool for tandem mass spectrometry data.

作者信息

Flikka Kristian, Meukens Jeroen, Helsens Kenny, Vandekerckhove Joël, Eidhammer Ingvar, Gevaert Kris, Martens Lennart

机构信息

Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Bergen, Norway.

出版信息

Proteomics. 2007 Sep;7(18):3245-58. doi: 10.1002/pmic.200700160.

DOI:10.1002/pmic.200700160

PMID:17708593

Abstract

High-throughput proteomics experiments typically generate large amounts of peptide fragmentation mass spectra during a single experiment. There is often a substantial amount of redundant fragmentation of the same precursors among these spectra, which is usually considered a nuisance. We here discuss the potential of clustering and merging redundant spectra to turn this redundancy into a useful property of the dataset. To this end, we have created the first general-purpose, freely available open-source software application for clustering and merging MS/MS spectra. The application also introduces a novel approach to calculating the similarity of fragmentation mass spectra that takes into account the increased precision of modern mass spectrometers, and we suggest a simple but effective improvement to single-linkage clustering. The application and the novel algorithms are applied to several real-life proteomic datasets and the results are discussed. An analysis of the influence of the different algorithms available and their parameters is given, as well as a number of important applications of the overall approach.

摘要

高通量蛋白质组学实验通常在单次实验中产生大量肽段碎裂质谱图。在这些质谱图中，同一前体往往存在大量冗余碎裂，这通常被视为一种麻烦。我们在此讨论对冗余质谱图进行聚类和合并的潜力，以便将这种冗余转化为数据集的一个有用特性。为此，我们创建了首个用于聚类和合并串联质谱（MS/MS）谱图的通用、免费开源软件应用程序。该应用程序还引入了一种计算碎裂质谱图相似度的新方法，该方法考虑了现代质谱仪提高的精度，并且我们提出了对单链聚类的一个简单而有效的改进。该应用程序和新算法被应用于多个实际蛋白质组学数据集，并对结果进行了讨论。给出了对可用的不同算法及其参数影响的分析，以及该整体方法的一些重要应用。