From the ‡Program in Bioinformatics, Boston University - Boston, MA 02215.
§Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine - Boston, MA 02118.
Mol Cell Proteomics. 2018 Jul;17(7):1448-1456. doi: 10.1074/mcp.RA118.000590. Epub 2018 Apr 3.
Glycosaminoglycans (GAGs) covalently linked to proteoglycans (PGs) are characterized by repeating disaccharide units and variable sulfation patterns along the chain. GAG length and sulfation patterns impact disease etiology, cellular signaling, and structural support for cells. We and others have demonstrated the usefulness of tandem mass spectrometry (MS) for assigning the structures of GAG saccharides; however, manual interpretation of tandem mass spectra is time-consuming, so computational methods must be employed. In the proteomics domain, the identification of monoisotopic peaks and charge states relies on algorithms that use averagine, or the average building block of the compound class being analyzed. Although these methods perform well for protein and peptide spectra, they perform poorly on GAG tandem mass spectra, because a single average building block does not characterize the variable sulfation of GAG disaccharide units. In addition, it is necessary to assign product ion isotope patterns to interpret the tandem mass spectra of GAG saccharides. To address these problems, we developed GAGfinder, the first tandem mass spectrum peak finding algorithm developed specifically for GAGs. We define peak finding as assigning experimental isotopic peaks directly to a given product ion composition, as opposed to deconvolution or peak picking, which are terms more accurately describing the existing methods previously mentioned. GAGfinder is a targeted, brute force approach to spectrum analysis that uses precursor composition information to generate all theoretical fragments. GAGfinder also performs peak isotope composition annotation, which is typically a subsequent step for averagine-based methods. Data are available via ProteomeXchange with identifier PXD009101.
糖胺聚糖 (GAGs) 通过共价键与蛋白聚糖 (PGs) 相连,其特征是重复的二糖单位和链上可变的硫酸化模式。GAG 的长度和硫酸化模式会影响疾病的病因、细胞信号转导以及细胞的结构支撑。我们和其他人已经证明了串联质谱 (MS) 用于分配 GAG 糖链结构的有用性;然而,串联质谱的手动解释非常耗时,因此必须采用计算方法。在蛋白质组学领域,单同位素峰和电荷状态的识别依赖于使用 averagine 或正在分析的化合物类别的平均构建块的算法。尽管这些方法在蛋白质和肽谱的鉴定中表现良好,但它们在 GAG 串联质谱的鉴定中表现不佳,因为单个平均构建块并不能表征 GAG 二糖单位的可变硫酸化。此外,有必要分配产物离子同位素模式来解释 GAG 糖的串联质谱。为了解决这些问题,我们开发了 GAGfinder,这是第一个专门为 GAG 开发的串联质谱峰发现算法。我们将峰发现定义为将实验同位素峰直接分配给给定的产物离子组成,而不是解卷积或峰选择,后两者更准确地描述了前面提到的现有方法。GAGfinder 是一种针对谱分析的靶向、蛮力方法,它使用前体组成信息来生成所有理论片段。GAGfinder 还执行峰同位素组成注释,这通常是 averagine 方法的后续步骤。数据可通过 ProteomeXchange 获得,标识符为 PXD009101。