Department of Biology, Indiana University, Bloomington, IN 47405, USA.
Proteomics. 2013 Mar;13(5):756-65. doi: 10.1002/pmic.201100670. Epub 2013 Feb 4.
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well-studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor-based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K-nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20-60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.
在 MS/MS 中搜索光谱库是提高肽和蛋白质鉴定质量的一种重要新方法。其基本思想是基于这样的观察:给定肽的 MS/MS 光谱中的离子强度通常在实验中具有可再现性,因此,与实验中获得的光谱与存储在光谱库中的先前鉴定的肽的光谱之间的匹配可以比传统的数据库搜索产生更好的肽鉴定。然而,库的使用受到肽序列覆盖度的极大限制:即使对于研究充分的生物体,也有很大一部分肽尚未被鉴定。为了解决这个问题,我们建议通过基于具有相似序列的肽的光谱来预测肽的 MS/MS 光谱,从而扩展光谱库。我们首先证明,相似肽之间的主要碎片离子的强度模式往往相似。根据这一观察结果,我们开发了一种基于邻居的方法,该方法首先选择可能具有与目标肽相似光谱的肽,然后使用加权 K-最近邻方法将它们的光谱组合在一起,以准确预测与目标肽对应的碎片离子强度。该方法有可能预测蛋白质组中的每个肽的光谱。当应用严格的质量标准时,我们估计该方法将国家标准与技术研究院提供的光谱库的覆盖范围增加了 20-60%,尽管肽的长度和电荷状态不同,值也有所不同。我们发现,当光谱库辅以高质量的预测光谱时,整体最佳搜索性能得以实现。