Liu Jian, Bell Alexander W, Bergeron John J M, Yanofsky Corey M, Carrillo Brian, Beaudrie Christian E H, Kearney Robert E
Center for Cellular & Biomolecular Research, University of Toronto, Toronto, Canada.
Proteome Sci. 2007 Jan 16;5:3. doi: 10.1186/1477-5956-5-3.
Tandem mass spectrometry followed by database search is currently the predominant technology for peptide sequencing in shotgun proteomics experiments. Most methods compare experimentally observed spectra to the theoretical spectra predicted from the sequences in protein databases. There is a growing interest, however, in comparing unknown experimental spectra to a library of previously identified spectra. This approach has the advantage of taking into account instrument-dependent factors and peptide-specific differences in fragmentation probabilities. It is also computationally more efficient for high-throughput proteomics studies.
This paper investigates computational issues related to this spectral comparison approach. Different methods have been empirically evaluated over several large sets of spectra. First, we illustrate that the peak intensities follow a Poisson distribution. This implies that applying a square root transform will optimally stabilize the peak intensity variance. Our results show that the square root did indeed outperform other transforms, resulting in improved accuracy of spectral matching. Second, different measures of spectral similarity were compared, and the results illustrated that the correlation coefficient was most robust. Finally, we examine how to assemble multiple spectra associated with the same peptide to generate a synthetic reference spectrum. Ensemble averaging is shown to provide the best combination of accuracy and efficiency.
Our results demonstrate that when combined, these methods can boost the sensitivity and specificity of spectral comparison. Therefore they are capable of enhancing and complementing existing tools for consistent and accurate peptide identification.
在鸟枪法蛋白质组学实验中,串联质谱结合数据库搜索是目前肽段测序的主要技术。大多数方法将实验观测到的谱图与从蛋白质数据库序列预测的理论谱图进行比较。然而,将未知的实验谱图与先前鉴定的谱图库进行比较的兴趣日益浓厚。这种方法的优点是考虑了仪器相关因素以及肽段特异性的碎片化概率差异。对于高通量蛋白质组学研究,它在计算上也更高效。
本文研究了与这种谱图比较方法相关的计算问题。在几组大量的谱图上对不同方法进行了实证评估。首先,我们表明峰强度遵循泊松分布。这意味着应用平方根变换将最佳地稳定峰强度方差。我们的结果表明,平方根变换确实优于其他变换,从而提高了谱图匹配的准确性。其次,比较了不同的谱图相似性度量,结果表明相关系数最为稳健。最后,我们研究了如何组装与同一肽段相关的多个谱图以生成合成参考谱图。结果表明,总体平均提供了准确性和效率的最佳组合。
我们的结果表明,这些方法结合使用时,可以提高谱图比较的灵敏度和特异性。因此,它们能够增强和补充现有的工具,以实现一致且准确的肽段鉴定。