Program for Bioinformatics, Boston University, Boston, MA, USA.
Department of Math and Statistics, Boston University, Boston, MA, USA.
Bioinformatics. 2018 Oct 15;34(20):3511-3518. doi: 10.1093/bioinformatics/bty397.
Glycosylation is one of the most heterogeneous and complex protein post-translational modifications. Liquid chromatography coupled mass spectrometry (LC-MS) is a common high throughput method for analyzing complex biological samples. Accurate study of glycans require high resolution mass spectrometry. Mass spectrometry data contains intricate sub-structures that encode mass and abundance, requiring several transformations before it can be used to identify biological molecules, requiring automated tools to analyze samples in a high throughput setting. Existing tools for interpreting the resulting data do not take into account related glycans when evaluating individual observations, limiting their sensitivity.
We developed an algorithm for assigning glycan compositions from LC-MS data by exploring biosynthetic network relationships among glycans. Our algorithm optimizes a set of likelihood scoring functions based on glycan chemical properties but uses network Laplacian regularization and optionally prior information about expected glycan families to smooth the likelihood and thus achieve a consistent and more representative solution. Our method was able to identify as many, or more glycan compositions compared to previous approaches, and demonstrated greater sensitivity with regularization. Our network definition was tailored to N-glycans but the method may be applied to glycomics data from other glycan families like O-glycans or heparan sulfate where the relationships between compositions can be expressed as a graph.
http://www.bumc.bu.edu/msr/glycresoft/ and Source Code: https://github.com/BostonUniversityCBMS/glycresoft.
Supplementary data are available at Bioinformatics online.
糖基化是蛋白质翻译后最复杂和多样化的修饰之一。液相色谱与质谱联用(LC-MS)是分析复杂生物样本的常用高通量方法。准确研究聚糖需要高分辨率质谱。质谱数据包含复杂的亚结构,这些亚结构编码质量和丰度,在将其用于鉴定生物分子之前需要进行多次转换,这需要自动化工具在高通量环境下分析样品。现有的解释这些数据的工具在评估单个观察值时没有考虑到相关的聚糖,从而限制了它们的灵敏度。
我们开发了一种从 LC-MS 数据中分配聚糖组成的算法,通过探索聚糖之间的生物合成网络关系。我们的算法基于聚糖的化学性质优化了一组似然评分函数,但使用网络拉普拉斯正则化,并可选地使用关于预期聚糖家族的先验信息来平滑似然度,从而实现一致且更具代表性的解决方案。与以前的方法相比,我们的方法能够识别出更多或相同数量的聚糖组成,并且正则化后表现出更高的灵敏度。我们的网络定义针对 N-聚糖进行了定制,但该方法可应用于其他聚糖家族(如 O-聚糖或肝素硫酸盐)的糖组学数据,其中组成之间的关系可以表示为图形。
http://www.bumc.bu.edu/msr/glycresoft/ 和源代码:https://github.com/BostonUniversityCBMS/glycresoft。
补充数据可在 Bioinformatics 在线获得。