Bioinformatics Program, Boston University, Boston, MA 02215, USA.
Department of Biochemistry, School of Medicine, Boston University, Boston, MA 02215, USA.
Molecules. 2021 Aug 6;26(16):4757. doi: 10.3390/molecules26164757.
Protein glycosylation that mediates interactions among viral proteins, host receptors, and immune molecules is an important consideration for predicting viral antigenicity. Viral spike proteins, the proteins responsible for host cell invasion, are especially important to be examined. However, there is a lack of consensus within the field of glycoproteomics regarding identification strategy and false discovery rate (FDR) calculation that impedes our examinations. As a case study in the overlap between software, here as a case study, we examine recently published SARS-CoV-2 glycoprotein datasets with four glycoproteomics identification software with their recommended protocols: GlycReSoft, Byonic, pGlyco2, and MSFragger-Glyco. These software use different Target-Decoy Analysis (TDA) forms to estimate FDR and have different database-oriented search methods with varying degrees of quantification capabilities. Instead of an ideal overlap between software, we observed different sets of identifications with the intersection. When clustering by glycopeptide identifications, we see higher degrees of relatedness within software than within glycosites. Taking the consensus between results yields a conservative and non-informative conclusion as we lose identifications in the desire for caution; these non-consensus identifications are often lower abundance and, therefore, more susceptible to nuanced changes. We conclude that present glycoproteomics softwares are not directly comparable, and that methods are needed to assess their overall results and FDR estimation performance. Once such tools are developed, it will be possible to improve FDR methods and quantify complex glycoproteomes with acceptable confidence, rather than potentially misleading broad strokes.
蛋白质糖基化介导病毒蛋白、宿主受体和免疫分子之间的相互作用,是预测病毒抗原性的一个重要考虑因素。病毒刺突蛋白是负责宿主细胞入侵的蛋白质,尤其需要进行检查。然而,糖蛋白质组学领域在鉴定策略和错误发现率(FDR)计算方面缺乏共识,这阻碍了我们的研究。作为软件之间重叠的一个案例研究,我们使用 GlycReSoft、Byonic、pGlyco2 和 MSFragger-Glyco 这四种糖蛋白质组学鉴定软件,对最近发表的 SARS-CoV-2 糖蛋白数据集进行了检查,这些软件都采用了各自推荐的方案。这些软件使用不同的目标诱饵分析(TDA)形式来估计 FDR,并且具有不同的面向数据库的搜索方法,其定量能力也各不相同。我们观察到的交集结果并不是软件之间的理想重叠,而是不同的鉴定集。当根据糖肽鉴定进行聚类时,我们看到软件内部的相关性比糖基化位点内部的相关性更高。为了谨慎起见,我们从结果中得出共识,得到了一个保守而无信息的结论,因为我们在追求谨慎的过程中失去了鉴定;这些非共识鉴定通常丰度较低,因此更容易受到细微变化的影响。我们得出的结论是,目前的糖蛋白质组学软件不能直接进行比较,需要评估它们的整体结果和 FDR 估计性能的方法。一旦开发出这些工具,就有可能改进 FDR 方法,并以可接受的置信度定量复杂的糖蛋白质组,而不是可能产生误导的宽泛笔触。