Kou Qiang, Zhu Binhai, Wu Si, Ansong Charles, Tolić Nikola, Paša-Tolić Ljiljana, Liu Xiaowen
Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis , Indianapolis, Indiana 46202, United States.
Department of Computer Science, Montana State University , Bozeman, Montana 59717, United States.
J Proteome Res. 2016 Aug 5;15(8):2422-32. doi: 10.1021/acs.jproteome.5b01098. Epub 2016 Jul 1.
Various proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs compared with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform-spectrum matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications.
由于基因变异、可变剪接和翻译后修饰(PTM)等一级结构改变(PSA),单个基因可能产生多种蛋白质异构体。自上而下的质谱分析能够分析完整的蛋白质并识别多种PSA模式,使其成为研究复杂蛋白质异构体的首选方法。在自上而下的蛋白质组学中,蛋白质异构体鉴定通常是通过将串联质谱与蛋白质序列数据库进行比对来完成的,该数据库在蛋白质组中每个基因或转录本变体仅包含一个参考蛋白质序列。由于蛋白质数据库的不完整性,与参考序列相比,鉴定出的蛋白质异构体可能包含未知的PSA。蛋白质异构体表征是指在蛋白质异构体中识别和定位PSA。虽然已经提出了许多软件工具用于通过自上而下的质谱分析鉴定蛋白质异构体,但在已鉴定的蛋白质异构体-谱匹配中对蛋白质异构体的表征仍主要依赖于人工注释。我们建议使用基于贝叶斯模型的修饰鉴定分数(MIScore)来自动识别和定位蛋白质异构体中的PTM。实验表明,MIScore在识别和定位一种或两种修饰方面是准确的。