Suppr超能文献

评估蛋白质组学中的从头测序:是否已经成为数据库驱动肽鉴定的准确替代方法?

Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification?

机构信息

Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany.

出版信息

Brief Bioinform. 2018 Sep 28;19(5):954-970. doi: 10.1093/bib/bbx033.

Abstract

While peptide identifications in mass spectrometry (MS)-based shotgun proteomics are mostly obtained using database search methods, high-resolution spectrum data from modern MS instruments nowadays offer the prospect of improving the performance of computational de novo peptide sequencing. The major benefit of de novo sequencing is that it does not require a reference database to deduce full-length or partial tag-based peptide sequences directly from experimental tandem mass spectrometry spectra. Although various algorithms have been developed for automated de novo sequencing, the prediction accuracy of proposed solutions has been rarely evaluated in independent benchmarking studies. The main objective of this work is to provide a detailed evaluation on the performance of de novo sequencing algorithms on high-resolution data. For this purpose, we processed four experimental data sets acquired from different instrument types from collision-induced dissociation and higher energy collisional dissociation (HCD) fragmentation mode using the software packages Novor, PEAKS and PepNovo. Moreover, the accuracy of these algorithms is also tested on ground truth data based on simulated spectra generated from peak intensity prediction software. We found that Novor shows the overall best performance compared with PEAKS and PepNovo with respect to the accuracy of correct full peptide, tag-based and single-residue predictions. In addition, the same tool outpaced the commercial competitor PEAKS in terms of running time speedup by factors of around 12-17. Despite around 35% prediction accuracy for complete peptide sequences on HCD data sets, taken as a whole, the evaluated algorithms perform moderately on experimental data but show a significantly better performance on simulated data (up to 84% accuracy). Further, we describe the most frequently occurring de novo sequencing errors and evaluate the influence of missing fragment ion peaks and spectral noise on the accuracy. Finally, we discuss the potential of de novo sequencing for now becoming more widely used in the field.

摘要

虽然基于质谱(MS)的鸟枪法蛋白质组学中的肽鉴定主要使用数据库搜索方法获得,但现代 MS 仪器的高分辨率谱数据现在提供了改进计算从头测序肽性能的前景。从头测序的主要优点是它不需要参考数据库,直接从实验串联质谱谱中推断全长或部分基于标签的肽序列。尽管已经开发了各种用于自动从头测序的算法,但在独立的基准研究中很少评估所提出解决方案的预测准确性。这项工作的主要目的是详细评估从头测序算法在高分辨率数据上的性能。为此,我们使用 Novor、PEAKS 和 PepNovo 软件包处理了来自不同仪器类型的四个实验数据集,这些数据集来自碰撞诱导解离和更高能量碰撞解离(HCD)碎裂模式。此外,我们还基于峰强度预测软件生成的模拟谱对这些算法的准确性进行了基于真实数据的测试。我们发现,与 PEAKS 和 PepNovo 相比,Novor 在正确的全长肽、基于标签和单残基预测的准确性方面表现出整体最佳性能。此外,同一工具在运行时间加速方面优于商业竞争对手 PEAKS,加速倍数约为 12-17 倍。尽管在 HCD 数据集上完整肽序列的预测准确率约为 35%,但总的来说,评估的算法在实验数据上表现中等,但在模拟数据上表现明显更好(准确率高达 84%)。此外,我们描述了最常见的从头测序错误,并评估了缺失片段离子峰和光谱噪声对准确性的影响。最后,我们讨论了从头测序在该领域变得更广泛应用的潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验