优化用于SEQUEST数据库搜索的过滤标准以提高鸟枪法蛋白质组学中的蛋白质组覆盖率。

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics.

作者信息

Jiang Xinning, Jiang Xiaogang, Han Guanghui, Ye Mingliang, Zou Hanfa

机构信息

National Chromatographic R&A Center, Dalian Institute of Chemical Physics, The Chinese Academy of Sciences, Dalian 116023, China.

出版信息

BMC Bioinformatics. 2007 Aug 31;8:323. doi: 10.1186/1471-2105-8-323.

DOI:10.1186/1471-2105-8-323

PMID:17761002

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2040164/

Abstract

BACKGROUND

In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, Delta Cn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now.

RESULTS

In this study, we implemented a machine learning approach known as predictive genetic algorithm (GA) for the optimization of filtering criteria to maximize the number of identified peptides at fixed false-discovery rate (FDR) for SEQUEST database searching. As the FDR was directly determined by decoy database search scheme, the GA based optimization approach did not require any pre-knowledge on the characteristics of the data set, which represented significant advantages over statistical approaches such as PeptideProphet. Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time. Moreover, the GA based approach can be easily extended to process other database search results as it did not rely on any assumption on the data.

CONCLUSION

Our results indicated that filtering criteria should be optimized individually for different samples. The new developed software using GA provides a convenient and fast way to create tailored optimal criteria for different proteome samples to improve proteome coverage.

摘要

背景

在蛋白质组学分析中，通过数据库搜索算法（如SEQUEST）将质谱仪获取的串联质谱（MS/MS）谱图与肽段进行匹配。SEQUEST搜索算法将肽段与MS/MS谱图的匹配由多个分数定义，包括交叉相关系数（Xcorr）、Delta Cn、Sp、Rsp、匹配离子数等。使用上述多个分数的过滤标准用于从随机匹配中分离出正确的鉴定结果。然而，到目前为止，该过滤标准尚未得到很好的优化。

结果

在本研究中，我们实施了一种称为预测遗传算法（GA）的机器学习方法，用于优化过滤标准，以在SEQUEST数据库搜索中固定错误发现率（FDR）的情况下最大化鉴定出的肽段数量。由于FDR直接由诱饵数据库搜索方案确定，基于GA的优化方法不需要对数据集的特征有任何先验知识，这相对于诸如PeptideProphet等统计方法具有显著优势。与PeptideProphet相比，基于GA的方法在区分真假匹配方面可以实现相似的性能，且处理时间仅为其十分之一。此外，基于GA的方法可以轻松扩展以处理其他数据库搜索结果，因为它不依赖于对数据的任何假设。

结论

我们的结果表明，过滤标准应针对不同样本进行单独优化。新开发的使用GA的软件提供了一种方便快捷的方法，可为不同的蛋白质组样本创建定制的最佳标准，以提高蛋白质组覆盖率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a5e/2040164/f753db0be143/1471-2105-8-323-1.jpg

相似文献

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics.优化用于SEQUEST数据库搜索的过滤标准以提高鸟枪法蛋白质组学中的蛋白质组覆盖率。

BMC Bioinformatics. 2007 Aug 31;8:323. doi: 10.1186/1471-2105-8-323.

Installation and use of the Computational Proteomics Analysis System (CPAS).计算蛋白质组学分析系统（CPAS）的安装与使用。

Curr Protoc Bioinformatics. 2007 Jun;Chapter 13:Unit 13.5. doi: 10.1002/0471250953.bi1305s18.

Added value for tandem mass spectrometry shotgun proteomics data validation through isoelectric focusing of peptides.通过肽段等电聚焦对串联质谱鸟枪法蛋白质组学数据进行验证的附加价值。

J Proteome Res. 2005 Nov-Dec;4(6):2273-82. doi: 10.1021/pr050193v.

Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.优化搜索引擎和后处理方法以最大化高分辨率质谱数据的肽段和蛋白质鉴定

J Proteome Res. 2015 Nov 6;14(11):4662-73. doi: 10.1021/acs.jproteome.5b00536. Epub 2015 Sep 30.

MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data.MASPECTRAS：一个用于蛋白质组学液相色谱-质谱/质谱数据管理与分析的平台。

BMC Bioinformatics. 2007 Jun 13;8:197. doi: 10.1186/1471-2105-8-197.

Identification of bacteria using tandem mass spectrometry combined with a proteome database and statistical scoring.使用串联质谱结合蛋白质组数据库和统计评分法鉴定细菌。

Anal Chem. 2004 Apr 15;76(8):2355-66. doi: 10.1021/ac0349781.

Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations.大规模蛋白质组学研究中使用的质谱平台的比较评估。

Nat Methods. 2005 Sep;2(9):667-75. doi: 10.1038/nmeth785.

ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity.ProLuCID：一种具有更高灵敏度和特异性的类似SEQUEST的改进算法。

J Proteomics. 2015 Nov 3;129:16-24. doi: 10.1016/j.jprot.2015.07.001. Epub 2015 Jul 11.

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.用于提高质谱法大规模蛋白质鉴定可信度的靶标-诱饵搜索策略。

Nat Methods. 2007 Mar;4(3):207-14. doi: 10.1038/nmeth1019.

Feature selection in validating mass spectrometry database search results.验证质谱数据库搜索结果中的特征选择。

J Bioinform Comput Biol. 2008 Feb;6(1):223-40. doi: 10.1142/s0219720008003345.

引用本文的文献

Seasonal Variation of Carbon Metabolism in the Cambial Zone of Eucalyptus grandis.巨桉形成层区域碳代谢的季节性变化

Front Plant Sci. 2016 Jun 28;7:932. doi: 10.3389/fpls.2016.00932. eCollection 2016.

Critical Role of COI1-Dependent Jasmonate Pathway in AAL toxin induced PCD in Tomato Revealed by Comparative Proteomics.通过比较蛋白质组学揭示了 COI1 依赖的茉莉酸途径在 AAL 毒素诱导的番茄程序性细胞死亡中的关键作用。

Sci Rep. 2016 Jun 21;6:28451. doi: 10.1038/srep28451.

Dynamics of the lipid droplet proteome of the Oleaginous yeast rhodosporidium toruloides.产油酵母红酵母脂质滴蛋白质组的动态变化

Eukaryot Cell. 2015 Mar;14(3):252-64. doi: 10.1128/EC.00141-14. Epub 2015 Jan 9.

A novel algorithm for validating peptide identification from a shotgun proteomics search engine.一种用于验证 shotgun 蛋白质组学搜索引擎中肽鉴定的新算法。

J Proteome Res. 2013 Mar 1;12(3):1108-19. doi: 10.1021/pr300631t. Epub 2013 Feb 12.

A multi-omic map of the lipid-producing yeast Rhodosporidium toruloides.酵母罗伦隐球酵母的多组学脂质生产图谱。

Nat Commun. 2012;3:1112. doi: 10.1038/ncomms2112.

Identification of outer membrane proteins from an Antarctic bacterium Pseudomonas syringae Lz4W.从南极假单胞菌 Lz4W 中鉴定外膜蛋白。

Mol Cell Proteomics. 2011 Jun;10(6):M110.004549. doi: 10.1074/mcp.M110.004549. Epub 2011 Mar 29.

Ubiquitinated proteome: ready for global?泛素化蛋白质组：准备好全球化了吗？

Mol Cell Proteomics. 2011 May;10(5):R110.006882. doi: 10.1074/mcp.R110.006882. Epub 2011 Feb 21.

Target-decoy search strategy for mass spectrometry-based proteomics.基于质谱的蛋白质组学的靶标-诱饵搜索策略

Methods Mol Biol. 2010;604:55-71. doi: 10.1007/978-1-60761-444-9_5.

Optimization of the Use of Consensus Methods for the Detection and Putative Identification of Peptides via Mass Spectrometry Using Protein Standard Mixtures.使用蛋白质标准混合物通过质谱法检测和推定鉴定肽段的共识方法的使用优化。

J Proteomics Bioinform. 2009 Jun 1;2(6):262-273. doi: 10.4172/jpb.1000085.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.用于提高质谱法大规模蛋白质鉴定可信度的靶标-诱饵搜索策略。

Nat Methods. 2007 Mar;4(3):207-14. doi: 10.1038/nmeth1019.

Automation of nanoflow liquid chromatography-tandem mass spectrometry for proteome analysis by using a strong cation exchange trap column.使用强阳离子交换捕集柱实现纳流液相色谱-串联质谱用于蛋白质组分析的自动化。

Proteomics. 2007 Feb;7(4):528-539. doi: 10.1002/pmic.200600661.

Automatic validation of phosphopeptide identifications from tandem mass spectra.串联质谱中磷酸化肽段鉴定的自动验证

Anal Chem. 2007 Feb 15;79(4):1301-10. doi: 10.1021/ac061334v.

Reproducible isolation of distinct, overlapping segments of the phosphoproteome.可重复分离磷酸化蛋白质组中不同的重叠片段。

Nat Methods. 2007 Mar;4(3):231-7. doi: 10.1038/nmeth1005. Epub 2007 Feb 11.

Quality assessment of tandem mass spectra based on cumulative intensity normalization.基于累积强度归一化的串联质谱质量评估

J Proteome Res. 2006 Dec;5(12):3241-8. doi: 10.1021/pr0603248.

A predictive model for identifying proteins by a single peptide match.一种通过单肽匹配来识别蛋白质的预测模型。

Bioinformatics. 2007 Feb 1;23(3):277-80. doi: 10.1093/bioinformatics/btl595. Epub 2006 Nov 22.

Global, in vivo, and site-specific phosphorylation dynamics in signaling networks.信号网络中的全局、体内及位点特异性磷酸化动力学

Cell. 2006 Nov 3;127(3):635-48. doi: 10.1016/j.cell.2006.09.026.

A probability-based approach for high-throughput protein phosphorylation analysis and site localization.一种基于概率的高通量蛋白质磷酸化分析及位点定位方法。

Nat Biotechnol. 2006 Oct;24(10):1285-92. doi: 10.1038/nbt1240. Epub 2006 Sep 10.

In-gel isoelectric focusing of peptides as a tool for improved protein identification.凝胶内肽段等电聚焦作为一种改进蛋白质鉴定的工具

J Proteome Res. 2006 Jul;5(7):1721-30. doi: 10.1021/pr0601180.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

优化用于SEQUEST数据库搜索的过滤标准以提高鸟枪法蛋白质组学中的蛋白质组覆盖率。

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献