• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种通过串联质谱进行更灵敏、可靠的肽段鉴定的支持向量机评分器。

An SVM scorer for more sensitive and reliable peptide identification via tandem mass spectrometry.

作者信息

Wang Haipeng, Fu Yan, Sun Ruixiang, He Simin, Zeng Rong, Gao Wen

机构信息

Digital Technology Lab, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China.

出版信息

Pac Symp Biocomput. 2006:303-14.

PMID:17094248
Abstract

Tandem mass spectrometry (MS/MS) has become increasingly important and indispensable in high-throughput proteomics for identifying complex protein mixtures. Database searching is the standard method to accomplish this purpose. A key sub-routine, peptide identification, is used to generate a list of candidate peptides from a protein database according to an experimental MS/MS spectrum, and then validate these candidate peptides for protein identification. Although currently there are many algorithms for peptide identification, most of them either lack an effective validation module or only validate the first-ranked peptide, thus leading to a low identification reliability or sensitivity. This paper proposes a new algorithm, named pepReap, to overcome the above drawbacks. It consists of a two-layered scoring scheme based on machine learning. The first layer is a rough scoring function which uses some simple and heuristic factors to measure the degree of the matches between an experimental MS/MS spectrum and the candidate peptides; thus a ranked list of candidate peptides is generated at a relatively low computational cost. The second layer is a fine scoring function which re-ranks the candidate peptides generated in the first layer and determines which one among them is the true positive. The fine scoring function was designed based on support vector machines (SVMs) using more comprehensive factors, such as the correlations between ions, the mass matching errors of fragment and peptide ions, etc. Consequently, the SVM classifier serves as not only a scorer but also a validation module. Experimental comparison with the popular SEQUEST algorithm coupled with threshold validation criteria on a reported dataset demonstrates that the pepReap algorithm achieves higher performance in terms of identification sensitivity with comparable precision.

摘要

串联质谱(MS/MS)在高通量蛋白质组学中对于鉴定复杂蛋白质混合物已变得越来越重要且不可或缺。数据库搜索是实现这一目的的标准方法。一个关键子例程——肽段鉴定,用于根据实验性MS/MS谱从蛋白质数据库生成候选肽段列表,然后验证这些候选肽段以进行蛋白质鉴定。尽管目前有许多用于肽段鉴定的算法,但它们大多要么缺乏有效的验证模块,要么仅验证排名第一的肽段,从而导致鉴定可靠性或灵敏度较低。本文提出了一种名为pepReap的新算法来克服上述缺点。它由基于机器学习的两层评分方案组成。第一层是一个粗略评分函数,它使用一些简单的启发式因素来衡量实验性MS/MS谱与候选肽段之间的匹配程度;从而以相对较低的计算成本生成候选肽段的排序列表。第二层是一个精细评分函数,它对在第一层中生成的候选肽段重新排序,并确定其中哪一个是真正的阳性肽段。精细评分函数基于支持向量机(SVM)设计,使用了更全面的因素,如离子之间的相关性、片段离子和肽段离子的质量匹配误差等。因此,SVM分类器不仅充当评分器,还充当验证模块。在一个已报道的数据集上与流行的SEQUEST算法结合阈值验证标准进行实验比较表明,pepReap算法在具有可比精度的情况下,在鉴定灵敏度方面实现了更高的性能。

相似文献

1
An SVM scorer for more sensitive and reliable peptide identification via tandem mass spectrometry.一种通过串联质谱进行更灵敏、可靠的肽段鉴定的支持向量机评分器。
Pac Symp Biocomput. 2006:303-14.
2
Support vector machines for improved peptide identification from tandem mass spectrometry database search.用于从串联质谱数据库搜索中改进肽段鉴定的支持向量机
Methods Mol Biol. 2009;492:453-60. doi: 10.1007/978-1-59745-493-3_28.
3
RT-PSM, a real-time program for peptide-spectrum matching with statistical significance.RT-PSM,一种用于肽谱匹配且具有统计学显著性的实时程序。
Rapid Commun Mass Spectrom. 2006;20(8):1199-208. doi: 10.1002/rcm.2435.
4
Mining tandem mass spectral data to develop a more accurate mass error model for peptide identification.挖掘串联质谱数据以开发更准确的质量误差模型用于肽段鉴定。
Pac Symp Biocomput. 2007:421-32.
5
MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry.MSNovo:一种通过串联质谱进行肽段从头测序的动态规划算法。
Anal Chem. 2007 Jul 1;79(13):4870-8. doi: 10.1021/ac070039n. Epub 2007 Jun 6.
6
A machine learning approach to predicting peptide fragmentation spectra.一种用于预测肽段碎裂谱的机器学习方法。
Pac Symp Biocomput. 2006:219-30.
7
[A novel approach for peptide identification by tandem mass spectrometry].[一种通过串联质谱法鉴定肽段的新方法]
Sheng Wu Hua Xue Yu Sheng Wu Wu Li Xue Bao (Shanghai). 2003 Aug;35(8):734-40.
8
A support for the identification of non-tryptic peptides based on low resolution tandem and sequential mass spectrometry data: the INSPIRE software.基于低分辨串联和顺序质谱数据的非胰蛋白酶肽识别支持:INSPIRE 软件。
Anal Chim Acta. 2012 Mar 9;718:70-7. doi: 10.1016/j.aca.2012.01.001. Epub 2012 Jan 11.
9
Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra.通过串联质谱搜索蛋白质序列数据库鉴定肽段的手动评估综合方法。
J Proteome Res. 2005 May-Jun;4(3):998-1005. doi: 10.1021/pr049754t.
10
Improving peptide identification with single-stage mass spectrum peaks.提高单级质谱峰的肽鉴定能力。
Bioinformatics. 2009 Nov 15;25(22):2969-74. doi: 10.1093/bioinformatics/btp501. Epub 2009 Aug 18.

引用本文的文献

1
Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.基于质谱的无标记全局蛋白质组学中缺失值插补挑战的综述、评估与讨论。
J Proteome Res. 2015 May 1;14(5):1993-2001. doi: 10.1021/pr501138h. Epub 2015 Apr 22.
2
Sequential projection pursuit principal component analysis--dealing with missing data associated with new -omics technologies.序贯投影寻踪主成分分析——处理与新组学技术相关的缺失数据。
Biotechniques. 2013 Mar;54(3):165-8. doi: 10.2144/000113978.
3
Penalized feature selection and classification in bioinformatics.
生物信息学中的惩罚特征选择与分类
Brief Bioinform. 2008 Sep;9(5):392-403. doi: 10.1093/bib/bbn027. Epub 2008 Jun 18.
4
Computational methods for protein identification from mass spectrometry data.从质谱数据中鉴定蛋白质的计算方法。
PLoS Comput Biol. 2008 Feb;4(2):e12. doi: 10.1371/journal.pcbi.0040012.