Liu Xiaowen, Dekker Lennard J M, Wu Si, Vanduijn Martijn M, Luider Theo M, Tolić Nikola, Kou Qiang, Dvorkin Mikhail, Alexandrova Sonya, Vyatkina Kira, Paša-Tolić Ljiljana, Pevzner Pavel A
Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis , 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States.
J Proteome Res. 2014 Jul 3;13(7):3241-8. doi: 10.1021/pr401300m. Epub 2014 Jun 18.
There are two approaches for de novo protein sequencing: Edman degradation and mass spectrometry (MS). Existing MS-based methods characterize a novel protein by assembling tandem mass spectra of overlapping peptides generated from multiple proteolytic digestions of the protein. Because each tandem mass spectrum covers only a short peptide of the target protein, the key to high coverage protein sequencing is to find spectral pairs from overlapping peptides in order to assemble tandem mass spectra to long ones. However, overlapping regions of peptides may be too short to be confidently identified. High-resolution mass spectrometers have become accessible to many laboratories. These mass spectrometers are capable of analyzing molecules of large mass values, boosting the development of top-down MS. Top-down tandem mass spectra cover whole proteins. However, top-down tandem mass spectra, even combined, rarely provide full ion fragmentation coverage of a protein. We propose an algorithm, TBNovo, for de novo protein sequencing by combining top-down and bottom-up MS. In TBNovo, a top-down tandem mass spectrum is utilized as a scaffold, and bottom-up tandem mass spectra are aligned to the scaffold to increase sequence coverage. Experiments on data sets of two proteins showed that TBNovo achieved high sequence coverage and high sequence accuracy.
埃德曼降解法和质谱法(MS)。现有的基于质谱的方法通过组装从蛋白质的多次蛋白酶消化产生的重叠肽段的串联质谱来表征一种新蛋白质。由于每个串联质谱仅覆盖目标蛋白质的短肽段,高覆盖率蛋白质测序的关键是从重叠肽段中找到光谱对,以便将串联质谱组装成长串联质谱。然而,肽段的重叠区域可能太短而无法可靠识别。高分辨率质谱仪已为许多实验室所用。这些质谱仪能够分析大质量值的分子,推动了自上而下质谱法的发展。自上而下的串联质谱覆盖整个蛋白质。然而,即使将自上而下的串联质谱组合起来,也很少能提供蛋白质的全离子碎片覆盖率。我们提出了一种名为TBNovo的算法,用于通过结合自上而下和自下而上的质谱法进行从头蛋白质测序。在TBNovo中,自上而下的串联质谱被用作支架,自下而上的串联质谱与该支架对齐以增加序列覆盖率。对两种蛋白质数据集的实验表明,TBNovo实现了高序列覆盖率和高序列准确性。