Kou Qiang, Wu Si, Tolic Nikola, Paša-Tolic Ljiljana, Liu Yunlong, Liu Xiaowen
Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA.
Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA.
Bioinformatics. 2017 May 1;33(9):1309-1316. doi: 10.1093/bioinformatics/btw806.
Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem.
We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms.
http://proteomics.informatics.iupui.edu/software/topmg/.
Supplementary data are available at Bioinformatics online.
尽管蛋白质组学在过去十年中迅速发展,但研究人员仍处于探索复杂蛋白质异构体世界的早期阶段,这些蛋白质异构体是由基因突变、可变剪接、翻译后修饰和其他生物学过程导致的具有各种一级结构改变的蛋白质产物。蛋白质异构体鉴定对于将蛋白质异构体与其生物学功能进行映射以及发现新的蛋白质异构体和新的蛋白质功能至关重要。自上而下的质谱分析是鉴定复杂蛋白质异构体的首选方法,因为它能提供完整蛋白质异构体的“鸟瞰图”。蛋白质上各种改变的组合爆炸可能导致数十亿种可能的蛋白质异构体,这使得蛋白质异构体鉴定成为一个具有挑战性的计算问题。
我们提出了一种名为质量图的新数据结构,用于高效表示蛋白质异构体,并设计了质量图比对算法。我们开发了TopMG,这是一种基于质量图的软件工具,用于通过自上而下的质谱分析鉴定蛋白质异构体。对自上而下的质谱数据集进行的实验表明,TopMG在鉴定复杂蛋白质异构体方面优于现有方法。
http://proteomics.informatics.iupui.edu/software/topmg/。
补充数据可在《生物信息学》在线获取。