Hogan Reuben A, Pepi Lauren E, Riley Nicholas M, Chalkley Robert J
University of California, San Francisco.
Beth Israel Deaconess Medical Center, Harvard Medical School.
bioRxiv. 2024 Jul 25:2024.07.24.604997. doi: 10.1101/2024.07.24.604997.
Glycoproteomics is a rapidly developing field, and data analysis has been stimulated by several technological innovations. As a result, there are many software tools from which to choose; and each comes with unique features that can be difficult to compare. This work presents a head-to-head comparison of five modern analytical software: Byonic, Protein Prospector, MSFraggerGlyco, pGlyco3, and GlycoDecipher. To enable a meaningful comparison, parameter variables were minimized. One potential confounding variable is the glycan database that informs glycoproteomic searches. We performed glycomic profiling of the samples and used the output to construct matched glycan databases for each software. Up to 19,000 glycopeptide spectra were identified across three replicates of wild-type SH-SY5Y cells. There was substantial overlap among most software for glycoproteins identified, locations of glycosites, and glycans, although Byonic reported a suspiciously large number of glycoproteins and glycosites of questionable reliability. We show that Protein Prospector identified the most glycopeptide spectrum matches with high agreement to known glycosites in UniProt. Overall, our results indicate that glycoproteomic searches should involve more than one software to generate confidence. It may be useful to consider software with peptide-first approaches and with glycan-first approaches.
糖蛋白质组学是一个快速发展的领域,数据分析受到了多项技术创新的推动。因此,有许多软件工具可供选择;而且每个工具都有独特的功能,难以进行比较。这项工作对五种现代分析软件进行了直接比较:Byonic、Protein Prospector、MSFraggerGlyco、pGlyco3和GlycoDecipher。为了进行有意义的比较,将参数变量降至最低。一个潜在的混杂变量是用于糖蛋白质组学搜索的聚糖数据库。我们对样本进行了糖组分析,并利用输出结果为每个软件构建了匹配的聚糖数据库。在野生型SH-SY5Y细胞的三个重复样本中,共鉴定出多达19000个糖肽谱。在大多数软件所鉴定的糖蛋白、糖基化位点位置和聚糖方面,存在大量重叠,尽管Byonic报告的糖蛋白和糖基化位点数量可疑地多,且可靠性存疑。我们表明,Protein Prospector鉴定出的糖肽谱匹配数最多,与UniProt中已知的糖基化位点高度一致。总体而言,我们的结果表明,糖蛋白质组学搜索应涉及不止一种软件以产生可信度。考虑采用先肽方法和先聚糖方法的软件可能会很有用。