Program in Bioinformatics, Boston University, Boston, Massachusetts, USA; Department of Biochemistry, Center for Biomedical Mass Spectrometry, Boston University School of Medicine, Boston, Massachusetts, USA.
Department of Biochemistry, Center for Biomedical Mass Spectrometry, Boston University School of Medicine, Boston, Massachusetts, USA.
Mol Cell Proteomics. 2021;20:100093. doi: 10.1016/j.mcpro.2021.100093. Epub 2021 May 14.
The sulfated glycosaminoglycans (GAGs) are long, linear polysaccharide chains that are typically found as the glycan portion of proteoglycans. These GAGs are characterized by repeating disaccharide units with variable sulfation and acetylation patterns along the chain. GAG length and modification patterns have profound impacts on growth factor signaling mechanisms central to numerous physiological processes. Electron activated dissociation tandem mass spectrometry is a very effective technique for assigning the structures of GAG saccharides; however, manual interpretation of the resulting complex tandem mass spectra is a difficult and time-consuming process that drives the development of computational methods for accurate and efficient sequencing. We have recently published GAGfinder, the first peak picking and elemental composition assignment algorithm specifically designed for GAG tandem mass spectra. Here, we present GAGrank, a novel network-based method for determining GAG structure using information extracted from tandem mass spectra using GAGfinder. GAGrank is based on Google's PageRank algorithm for ranking websites for search engine output. In particular, it is an implementation of BiRank, an extension of PageRank for bipartite networks. In our implementation, the two partitions comprise every possible sequence for a given GAG composition and the tandem MS fragments found using GAGfinder. Sequences are given a higher ranking if they link to many important fragments. Using the simulated annealing probabilistic optimization technique, we optimized GAGrank's parameters on ten training sequences. We then validated GAGrank's performance on three validation sequences. We also demonstrated GAGrank's ability to sequence isomeric mixtures using two mixtures at five different ratios.
硫酸化糖胺聚糖 (GAGs) 是长的线性多糖链,通常作为蛋白聚糖的糖胺聚糖部分存在。这些 GAGs 的特征在于沿链重复的二糖单位,具有可变的硫酸化和乙酰化模式。GAG 的长度和修饰模式对许多生理过程中至关重要的生长因子信号机制有深远影响。电子激活解离串联质谱是一种非常有效的技术,用于分配 GAG 糖的结构;然而,对复杂的串联质谱的手动解释是一个困难且耗时的过程,这推动了用于准确和高效测序的计算方法的发展。我们最近发布了 GAGfinder,这是第一个专门为 GAG 串联质谱设计的峰选择和元素组成分配算法。在这里,我们提出了 GAGrank,这是一种使用 GAGfinder 从串联质谱中提取信息来确定 GAG 结构的新型基于网络的方法。GAGrank 基于 Google 的 PageRank 算法,用于为搜索引擎输出对网站进行排名。特别是,它是 PageRank 的 BiRank 扩展,用于二部网络。在我们的实现中,两个分区包含给定 GAG 组成和使用 GAGfinder 找到的串联 MS 片段的所有可能序列。如果序列链接到许多重要片段,则会获得更高的排名。我们使用模拟退火概率优化技术在十个训练序列上优化了 GAGrank 的参数。然后,我们在三个验证序列上验证了 GAGrank 的性能。我们还展示了 GAGrank 对两种混合物在五个不同比例下的混合物进行测序的能力。