Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA.
J Proteome Res. 2013 Aug 2;12(8):3652-66. doi: 10.1021/pr400196s. Epub 2013 Jul 22.
Glycosylation is a common protein modification with a significant role in many vital cellular processes and human diseases, making the characterization of protein-attached glycan structures important for understanding cell biology and disease processes. Direct analysis of protein N-glycosylation by tandem mass spectrometry of glycopeptides promises site-specific elucidation of N-glycan microheterogeneity, something that detached N-glycan and deglycosylated peptide analyses cannot provide. However, successful implementation of direct N-glycopeptide analysis by tandem mass spectrometry remains a challenge. In this work, we consider algorithmic techniques for the analysis of LC-MS/MS data acquired from glycopeptide-enriched fractions of enzymatic digests of purified proteins. We implement a computational strategy that takes advantage of the properties of CID fragmentation spectra of N-glycopeptides, matching the MS/MS spectra to peptide-glycan pairs from protein sequences and glycan structure databases. Significantly, we also propose a novel false discovery rate estimation technique to estimate and manage the number of false identifications. We use a human glycoprotein standard, haptoglobin, digested with trypsin and GluC, enriched for glycopeptides using HILIC chromatography, and analyzed by LC-MS/MS to demonstrate our algorithmic strategy and evaluate its performance. Our software, GlycoPeptideSearch (GPS), assigned glycopeptide identifications to 246 of the spectra at a false discovery rate of 5.58%, identifying 42 distinct haptoglobin peptide-glycan pairs at each of the four haptoglobin N-linked glycosylation sites. We further demonstrate the effectiveness of this approach by analyzing plasma-derived haptoglobin, identifying 136 N-linked glycopeptide spectra at a false discovery rate of 0.4%, representing 15 distinct glycopeptides on at least three of the four N-linked glycosylation sites. The software, GlycoPeptideSearch, is available for download from http://edwardslab.bmcb.georgetown.edu/GPS .
糖基化是一种常见的蛋白质修饰,在许多重要的细胞过程和人类疾病中都具有重要作用,因此,对蛋白质结合糖结构的特征描述对于理解细胞生物学和疾病过程非常重要。通过串联质谱对糖肽的分析直接研究蛋白质 N-糖基化,有望实现对 N-糖基化微异质性的位点特异性阐明,而这是分离的 N-糖和去糖肽分析无法提供的。然而,串联质谱直接分析 N-糖肽的成功实施仍然是一个挑战。在这项工作中,我们考虑了用于分析从酶解纯化蛋白的糖肽富集馏分中获得的 LC-MS/MS 数据的算法技术。我们实施了一种计算策略,该策略利用了 N-糖肽 CID 碎裂谱的特性,将 MS/MS 谱与来自蛋白质序列和聚糖结构数据库的肽聚糖对进行匹配。值得注意的是,我们还提出了一种新的错误发现率估计技术,以估计和管理错误识别的数量。我们使用人糖蛋白标准品 haptoglobin,用胰蛋白酶和 GluC 进行酶解,用亲水色谱法富集糖肽,然后用 LC-MS/MS 进行分析,以展示我们的算法策略并评估其性能。我们的软件 GlycoPeptideSearch (GPS) 在错误发现率为 5.58%的情况下,将糖肽鉴定分配给 246 个谱,在 haptoglobin 的四个 N 连接糖基化位点中的每一个都鉴定了 42 个不同的 haptoglobin 肽聚糖对。我们通过分析血浆来源的 haptoglobin 进一步证明了这种方法的有效性,在错误发现率为 0.4%的情况下,鉴定了 136 个 N 连接糖肽谱,在至少四个 N 连接糖基化位点中的三个上代表了 15 个不同的糖肽。软件 GlycoPeptideSearch 可从 http://edwardslab.bmcb.georgetown.edu/GPS 下载。