Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0017, Japan.
Genes Cells. 2012 Aug;17(8):633-44. doi: 10.1111/j.1365-2443.2012.01615.x. Epub 2012 Jun 12.
We have developed a novel bioinformatics method called mass spectrum sequential subtraction (MSSS) to search large peptide spectra datasets produced by liquid chromatography/mass spectrometry (LC-MS/MS) against protein and large-sized nucleotide sequence databases. The main principle in MSSS is to search the peptide spectra set against the protein database, followed by removal of the spectra corresponding to the identified peptides to create a smaller set of the remaining peptide spectra for searching against the nucleotide sequences database. Therefore, we reduce the number of spectra to be searched to limit the peptide search space. Comparing MSSS and conventional search approach using a dataset of 27 LC-MS/MS runs of rice culture cells indicated that MSSS reduced the search queries to 50% and the search time to 75% on average. In addition, MSSS had no effect on the identification false-positive rate (FPR) or the novel peptide sequences identification ability. We used MSSS to analyze another dataset of 34 LC-MS/MS runs, resulting in identifying additional 74 novel peptides. Proteogenomic analysis with these additional peptides yielded 47 new genomic features in 24 rice genes plus 24 intergenic peptides. These results show that the utility of MSSS in searching large databases with large MS/MS datasets for proteogenomics.
我们开发了一种新的生物信息学方法,称为质谱序列消减(MSSS),用于针对蛋白质和大型核苷酸序列数据库搜索由液相色谱/质谱(LC-MS/MS)产生的大型肽谱数据集。MSSS 的主要原理是先在蛋白质数据库中搜索肽谱集,然后去除对应于已鉴定肽的谱,为搜索核苷酸序列数据库创建更小的剩余肽谱集。因此,我们减少了要搜索的谱的数量,以限制肽的搜索空间。使用水稻培养细胞的 27 个 LC-MS/MS 运行数据集比较 MSSS 和常规搜索方法表明,MSSS 将搜索查询平均减少了 50%,搜索时间减少了 75%。此外,MSSS 对鉴定假阳性率(FPR)或新肽序列的鉴定能力没有影响。我们使用 MSSS 分析了另一个 34 个 LC-MS/MS 运行数据集,结果鉴定出另外 74 个新肽。对这些额外肽的蛋白质基因组分析在 24 个水稻基因和 24 个基因间肽中产生了 47 个新的基因组特征。这些结果表明,MSSS 在使用大型 MS/MS 数据集搜索大型数据库进行蛋白质基因组学方面的实用性。