Suppr超能文献

pValid 2:一种基于深度学习的 shotgun 蛋白质组学中肽段鉴定的验证方法,具有更高的判别能力。

pValid 2: A deep learning based validation method for peptide identification in shotgun proteomics with increased discriminating power.

机构信息

Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.

Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China; University of Chinese Academy of Sciences, Beijing, China.

出版信息

J Proteomics. 2022 Jan 16;251:104414. doi: 10.1016/j.jprot.2021.104414. Epub 2021 Nov 2.

Abstract

Tandem mass spectrometry has been the principal method in shotgun proteomics for peptide and protein identification. However, incorrect identifications reported by proteome search engines are still unknown, and further validation methods are needed. We have proposed a validation method pValid before, but its scope of application is limited because two features used in pValid are related to open database search and sub-optimal peptide candidates for tandem mass spectra, and the performance on complex datasets still has room for improvement. In this study, we developed a more comprehensive validation method, pValid 2, to break these limitations by removing the two features and bringing in a new feature related to the retention time predicted by a deep learning-based method pPredRT. pValid 2 yielded an average false positive rate of 0.03% and an average false negative rate of 1.37% on three testing datasets, better than those of pValid, and flagged 8.47% to 11.31% more incorrect identifications than pValid on two complex datasets. Moreover, pValid 2 flagged almost all decoy identifications in validating the open-search datasets. In addition, the function of validating identifications given by MaxQuant and MS-GF+ was implemented in pValid 2, and the validation results showed that pValid 2 performed dramatically better than three metabolic labeling validation methods. Further considering its cost-effectiveness as a pure computational approach, pValid 2 has the potential to be a widely used validation tool for peptide identifications of any proteome search engines in shotgun proteomics. SIGNIFICANCE: Identification results given by shotgun proteomics are vital to life science research. The correctness of identifications deeply affects the precision of the subsequent studies about protein structures and functions, protein-protein interactions, pathogenic mechanism, and targeted drugs. Thus, validating the correctness of identifications is crucial and urgent. In 2019, we developed an identification credibility validation method named pValid, whose false positive rate (FPR) is 0.03% and false negative rate (FNR) is 1.79%, comparable to those of the gold standard, i.e., the Synthetic-peptide validation method. However, pValid can only be used for validating the results from pFind, and its validation performance on a few complex datasets still has room for improvement. So, in this submission, we proposed pValid 2, a more comprehensive computational validation method that can validate identifications from any proteome search engines with increased discriminating power.

摘要

串联质谱已成为蛋白质组学中用于肽和蛋白质鉴定的主要方法。然而,蛋白质组搜索引擎报告的错误鉴定仍然未知,需要进一步的验证方法。我们之前提出了一种验证方法 pValid,但由于其应用范围有限,因为 pValid 中使用的两个特征与开放数据库搜索和串联质谱的次优肽候选物有关,并且在复杂数据集上的性能仍有改进的空间。在这项研究中,我们开发了一种更全面的验证方法 pValid 2,通过去除两个特征并引入一个与基于深度学习的方法 pPredRT 预测的保留时间相关的新特征来打破这些限制。pValid 2 在三个测试数据集上的平均假阳性率为 0.03%,平均假阴性率为 1.37%,优于 pValid,并且在两个复杂数据集上标记了 8.47%至 11.31%更多的错误鉴定。此外,pValid 2 在验证开放搜索数据集时几乎标记了所有诱饵鉴定。此外,pValid 2 实现了验证 MaxQuant 和 MS-GF+ 给出的鉴定的功能,验证结果表明,pValid 2 的性能明显优于三种代谢标记验证方法。进一步考虑到其作为纯计算方法的成本效益,pValid 2 有可能成为蛋白质组学中任何蛋白质搜索引擎的肽鉴定的广泛使用的验证工具。意义:蛋白质组学给出的鉴定结果对生命科学研究至关重要。鉴定的正确性深深影响着后续关于蛋白质结构和功能、蛋白质-蛋白质相互作用、发病机制和靶向药物的研究的准确性。因此,验证鉴定的正确性至关重要和紧迫。2019 年,我们开发了一种鉴定可信度验证方法,命名为 pValid,其假阳性率(FPR)为 0.03%,假阴性率(FNR)为 1.79%,与黄金标准,即合成肽验证方法相当。然而,pValid 只能用于验证 pFind 的结果,并且其在一些复杂数据集上的验证性能仍有改进的空间。因此,在本提交中,我们提出了 pValid 2,这是一种更全面的计算验证方法,可以提高鉴别能力,验证来自任何蛋白质组搜索引擎的鉴定。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验