College of Automation, Harbin Engineering University, 145, Nan Tong Street, Harbin, Heilongjiang, 150001, China.
Department of Medical and Molecular Genetics, Indiana University School of Medicine, 410 West 10th Street, Indianapolis, IN, 46202, USA.
BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):494. doi: 10.1186/s12859-018-2462-1.
Top-down mass spectrometry has unique advantages in identifying proteoforms with multiple post-translational modifications and/or unknown alterations. Most software tools in this area search top-down mass spectra against a protein sequence database for proteoform identification. When the species studied in a mass spectrometry experiment lacks its proteome sequence database, a homologous protein sequence database can be used for proteoform identification. The accuracy of homologous protein sequences affects the sensitivity of proteoform identification and the accuracy of mass shift localization.
We tested TopPIC, a commonly used software tool for top-down mass spectral identification, on a top-down mass spectrometry data set of Escherichia coli K12 MG1655, and evaluated its performance using an Escherichia coli K12 MG1655 proteome database and a homologous protein database. The number of identified spectra with the homologous database was about half of that with the Escherichia coli K12 MG1655 database. We also tested TopPIC on a top-down mass spectrometry data set of human MCF-7 cells and obtained similar results.
Experimental results demonstrated that TopPIC is capable of identifying many proteoform spectrum matches and localizing unknown alterations using homologous protein sequences containing no more than 2 mutations.
自上而下的质谱分析在鉴定具有多个翻译后修饰和/或未知修饰的蛋白形式方面具有独特的优势。该领域的大多数软件工具都是通过将自上而下的质谱与蛋白质序列数据库进行搜索来进行蛋白形式鉴定的。当质谱实验中研究的物种缺乏其蛋白质组序列数据库时,可以使用同源蛋白质序列数据库进行蛋白形式鉴定。同源蛋白质序列的准确性会影响蛋白形式鉴定的灵敏度和质量位移定位的准确性。
我们使用大肠杆菌 K12 MG1655 的自上而下质谱数据集测试了常用的自上而下质谱识别软件工具 TopPIC,并使用大肠杆菌 K12 MG1655 蛋白质组数据库和同源蛋白质数据库评估了其性能。使用同源数据库鉴定的光谱数量约为使用大肠杆菌 K12 MG1655 数据库的一半。我们还使用 TopPIC 对人 MCF-7 细胞的自上而下质谱数据集进行了测试,得到了类似的结果。
实验结果表明,TopPIC 能够使用含有不超过 2 个突变的同源蛋白质序列识别许多蛋白形式谱匹配和定位未知的修饰。