Zhejiang University, Hangzhou, Zhejiang Province, China.
School of Engineering, Westlake University, Hangzhou, Zhejiang Province 310024, China.
Bioinformatics. 2022 Mar 4;38(6):1525-1531. doi: 10.1093/bioinformatics/btab878.
Peptide identification of data-independent acquisition (DIA) mass spectrometry applying the peptide-centric approach heavily relies on the spectral library matching, such as the fragment intensity similarity. If the intensity similarity is calculated through all possible fragment ions of a targeted peptide instead of just a few fragment ions provided by the spectral library, the matching will be more comprehensive and reliable, and thus the identification will be more confident. In addition, the emergence of high precision spectrum predictors, like Prosit, also makes it possible to capitalize on the predicted spectrum, which contains all possible fragment ion intensities, to calculate the intensity similarity for DIA data.
In this work, we propose Alpha-Tri, a neural-network-based model to calculate intensity similarity as a post-processing score using the predicted spectrum, measured spectrum and correlation spectrum (triple-spectrum). The predicted spectrum is generated by Prosit, the measured spectrum is retrieved from the apex of the chromatograms of all possible fragment ions and the correlation spectrum is used to indicate the present probabilities of these fragment ions as the link between the precursor and its fragment ions is lost in DIA. By adopting a data-driven method, Alpha-Tri is able to learn the intensity similarity from the triple-spectrum. This learned value is appended to initial scores from DIA-NN, allowing the ensuing statistical validation tool to report more peptides at the same false discovery rate (FDR). In our evaluation of the HeLa dataset with gradient lengths ranging from 0.5 to 2 h, Alpha-Tri delivered 3.0-7.2% gains in peptide detections at 1% FDR. On LFQbench dataset, a mixed-species dataset with known ratios, Alpha-Tri identified more peptides and proteins fell within the valid ratio ranges by up to 8.6% and 7.6%, respectively, compared with DIA-NN solely.
The original datasets for benchmarks are downloaded from the ProteomeXchange with the identifiers PXD005573, PXD000954 and PXD002952. Source code is available at https://github.com/YuAirLab/Alpha-Tri.
应用基于肽的方法对数据独立采集 (DIA) 质谱进行肽鉴定严重依赖于谱库匹配,例如片段强度相似性。如果通过靶向肽的所有可能的片段离子而不是仅通过谱库提供的少数几个片段离子来计算强度相似性,则匹配将更加全面和可靠,从而鉴定将更加有信心。此外,高精度谱预测器(如 Prosit)的出现也使得可以利用包含所有可能的片段离子强度的预测谱来计算 DIA 数据的强度相似性。
在这项工作中,我们提出了基于神经网络的模型 Alpha-Tri,该模型使用预测谱、测量谱和相关谱(三谱)作为后处理评分来计算强度相似性。预测谱由 Prosit 生成,测量谱从所有可能的片段离子色谱峰的顶点中检索,相关谱用于指示这些片段离子的存在概率,因为在 DIA 中,前体与其片段离子之间的连接丢失。通过采用数据驱动的方法,Alpha-Tri 能够从三谱中学习强度相似性。这个学习到的值被添加到来自 DIA-NN 的初始得分中,从而使随后的统计验证工具能够在相同的假发现率 (FDR) 下报告更多的肽。在我们对 HeLa 数据集的评估中,梯度长度从 0.5 到 2 小时不等,Alpha-Tri 在 1% FDR 下提供了 3.0-7.2%的肽检测增益。在 LFQbench 数据集上,一个具有已知比例的混合物种数据集,与仅使用 DIA-NN 相比,Alpha-Tri 分别确定了更多的肽和蛋白质落在有效比例范围内,最多可达 8.6%和 7.6%。
基准测试的原始数据集从 ProteomeXchange 下载,标识符为 PXD005573、PXD000954 和 PXD002952。源代码可在 https://github.com/YuAirLab/Alpha-Tri 上获得。