Center for Systems and Synthetic Biology, Department of Molecular Biosciences, University of Texas , Austin, Texas 78712, United States.
Department of Chemistry, University of Texas , Austin, Texas 78712, United States.
Anal Chem. 2017 Mar 21;89(6):3747-3753. doi: 10.1021/acs.analchem.7b00130. Epub 2017 Mar 8.
We describe a strategy for de novo peptide sequencing based on matched pairs of tandem mass spectra (MS/MS) obtained by collision induced dissociation (CID) and 351 nm ultraviolet photodissociation (UVPD). Each precursor ion is isolated twice with the mass spectrometer switching between CID and UVPD activation modes to obtain a complementary MS/MS pair. To interpret these paired spectra, we modified the UVnovo de novo sequencing software to automatically learn from and interpret fragmentation spectra, provided a representative set of training data. This machine learning procedure, using random forests, synthesizes information from one or multiple complementary spectra, such as the CID/UVPD pairs, into peptide fragmentation site predictions. In doing so, the burden of fragmentation model definition shifts from programmer to machine and opens up the model parameter space for inclusion of nonobvious features and interactions. This spectral synthesis also serves to transform distinct types of spectra into a common representation for subsequent activation-independent processing steps. Then, independent from precursor activation constraints, UVnovo's de novo sequencing procedure generates and scores sequence candidates for each precursor. We demonstrate the combined experimental and computational approach for de novo sequencing using whole cell E. coli lysate. In benchmarks on the CID/UVPD data, UVnovo assigned correct full-length sequences to 83% of the spectral pairs of doubly charged ions with high-confidence database identifications. Considering only top-ranked de novo predictions, 70% of the pairs were deciphered correctly. This de novo sequencing performance exceeds that of PEAKS and PepNovo on the CID spectra and that of UVnovo on CID or UVPD spectra alone. As presented here, the methods for paired CID/UVPD spectral acquisition and interpretation constitute a powerful workflow for high-throughput and accurate de novo peptide sequencing.
我们描述了一种基于串联质谱(MS/MS)匹配对的从头测序策略,这些匹配对是通过碰撞诱导解离(CID)和 351nm 紫外光解离(UVPD)获得的。每个前体离子都通过质谱仪在 CID 和 UVPD 激活模式之间切换两次进行两次隔离,以获得互补的 MS/MS 对。为了解释这些成对的光谱,我们修改了 UVnovo 从头测序软件,使其能够自动从和解释碎片光谱中学习,并提供了一组有代表性的训练数据。这种机器学习过程使用随机森林,从一个或多个互补光谱(如 CID/UVPD 对)中综合信息,将其转化为肽片段化位点预测。这样,从程序员到机器的碎片模型定义的负担就转移了,并为包括不明显的特征和相互作用的模型参数空间打开了大门。这种光谱合成还有助于将不同类型的光谱转化为后续激活独立处理步骤的共同表示。然后,独立于前体激活约束,UVnovo 的从头测序过程为每个前体生成并评分序列候选。我们使用整个细胞大肠杆菌裂解物展示了从头测序的组合实验和计算方法。在 CID/UVPD 数据的基准测试中,UVnovo 将正确的全长序列分配给 83%的具有高置信度数据库鉴定的双电荷离子的成对光谱,具有高置信度数据库鉴定。仅考虑排名最高的从头预测,70%的对被正确破译。这种从头测序性能超过了 PEAKS 和 PepNovo 在 CID 光谱上的性能,也超过了 UVnovo 在 CID 或 UVPD 光谱上的性能。如本文所述,成对 CID/UVPD 光谱采集和解释方法构成了一种用于高通量和准确从头肽测序的强大工作流程。