Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, USA.
Nat Methods. 2011 May 15;8(7):587-91. doi: 10.1038/nmeth.1609.
Tandem mass spectrometry (MS/MS) experiments yield multiple, nearly identical spectra of the same peptide in various laboratories, but proteomics researchers typically do not leverage the unidentified spectra produced in other labs to decode spectra they generate. We propose a spectral archives approach that clusters MS/MS datasets, representing similar spectra by a single consensus spectrum. Spectral archives extend spectral libraries by analyzing both identified and unidentified spectra in the same way and maintaining information about peptide spectra that are common across species and conditions. Thus archives offer both traditional library spectrum similarity-based search capabilities along with new ways to analyze the data. By developing a clustering tool, MS-Cluster, we generated a spectral archive from ∼1.18 billion spectra that greatly exceeds the size of existing spectral repositories. We advocate that publicly available data should be organized into spectral archives rather than be analyzed as disparate datasets, as is mostly the case today.
串联质谱(MS/MS)实验在不同实验室中产生同一肽的多个几乎相同的谱图,但蛋白质组学研究人员通常不会利用其他实验室生成的未识别谱图来解码他们自己生成的谱图。我们提出了一种谱图档案方法,通过单个共识谱图对 MS/MS 数据集进行聚类,从而代表相似的谱图。谱图档案通过以相同的方式分析已识别和未识别的谱图来扩展谱图库,并保留有关跨物种和条件的肽谱图的信息。因此,档案库不仅提供了传统的基于库谱相似度的搜索功能,还提供了新的数据分析方法。通过开发聚类工具 MS-Cluster,我们从约 11.8 亿个谱图中生成了一个谱图档案,其规模大大超过了现有谱图库的大小。我们主张,应将可公开获得的数据组织成谱图档案,而不是像当今大多数情况那样作为不同数据集进行分析。