Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.
J Proteome Res. 2013 Feb 1;12(2):615-25. doi: 10.1021/pr3006843. Epub 2012 Dec 28.
De novo peptide sequencing is the only tool for extracting peptide sequences directly from tandem mass spectrometry (MS) data without any protein database. However, neither the accuracy nor the efficiency of de novo sequencing has been satisfactory, mainly due to incomplete fragmentation information in experimental spectra. Recent advancement in MS technology has enabled acquisition of higher energy collisional dissociation (HCD) and electron transfer dissociation (ETD) spectra of the same precursor. These spectra contain complementary fragmentation information and can be collected with high resolution and high mass accuracy. Taking these advantages, we have developed a new algorithm called pNovo+, which greatly improves the accuracy and speed of de novo sequencing. On tryptic peptides, 86% of the topmost candidate sequences deduced by pNovo+ from HCD + ETD spectral pairs matched the database search results, and the success rate reached 95% if the top three candidates were included, which was much higher than using only HCD (87%) or only ETD spectra (57%). On Asp-N, Glu-C, or Elastase digested peptides, 69-87% of the HCD + ETD spectral pairs were correctly identified by pNovo+ among the topmost candidates, or 84-95% among the top three. On average, it takes pNovo+ only 0.018 s to extract the sequence from a spectrum or spectral pair on a common personal computer. This is more than three times as fast as other de novo sequencing programs. The increase of speed is mainly due to pDAG, a component algorithm of pNovo+. pDAG finds the k longest paths in a directed acyclic graph without the antisymmetry restriction. We have verified that the antisymmetry restriction is unnecessary for high resolution, high mass accuracy data. The extensive use of HCD and ETD spectral information and the pDAG algorithm make pNovo+ an excellent de novo sequencing tool.
从头测序是从串联质谱 (MS) 数据中直接提取肽序列的唯一工具,无需任何蛋白质数据库。然而,从头测序的准确性和效率都不尽如人意,主要是由于实验谱中存在不完全的碎片化信息。最近 MS 技术的进步使得能够获得相同前体的更高能量碰撞解离 (HCD) 和电子转移解离 (ETD) 谱。这些谱包含互补的碎片化信息,可以以高分辨率和高质量精度采集。利用这些优势,我们开发了一种新的算法 pNovo+,它极大地提高了从头测序的准确性和速度。在胰蛋白酶肽上,pNovo+ 从 HCD+ETD 谱对中推导出的最顶层候选序列中有 86%与数据库搜索结果匹配,如果包括前三个候选序列,成功率达到 95%,这比仅使用 HCD(87%)或仅 ETD 谱(57%)高得多。在 Asp-N、Glu-C 或弹性蛋白酶消化的肽上,pNovo+ 在最顶层候选序列中正确识别了 69-87%的 HCD+ETD 谱对,或者在前三名候选序列中识别了 84-95%。平均而言,pNovo+ 在普通个人计算机上从一个光谱或光谱对中提取序列仅需 0.018 秒。这比其他从头测序程序快三倍多。速度的提高主要归功于 pNovo+ 的组件算法 pDAG。pDAG 在没有反对称性限制的有向无环图中找到 k 条最长路径。我们已经验证了反对称性限制对于高分辨率、高质量精度的数据是不必要的。广泛使用 HCD 和 ETD 谱信息和 pDAG 算法使 pNovo+成为一种出色的从头测序工具。