使用更新的机器学习方法改进质谱数据库搜索结果的分类。

Improved classification of mass spectrometry database search results using newer machine learning approaches.

作者信息

Ulintz Peter J, Zhu Ji, Qin Zhaohui S, Andrews Philip C

机构信息

National Resource for Proteomics and Pathways, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109, USA.

出版信息

Mol Cell Proteomics. 2006 Mar;5(3):497-509. doi: 10.1074/mcp.M500233-MCP200. Epub 2005 Nov 30.

DOI:10.1074/mcp.M500233-MCP200

PMID:16321970

Abstract

Manual analysis of mass spectrometry data is a current bottleneck in high throughput proteomics. In particular, the need to manually validate the results of mass spectrometry database searching algorithms can be prohibitively time-consuming. Development of software tools that attempt to quantify the confidence in the assignment of a protein or peptide identity to a mass spectrum is an area of active interest. We sought to extend work in this area by investigating the potential of recent machine learning algorithms to improve the accuracy of these approaches and as a flexible framework for accommodating new data features. Specifically we demonstrated the ability of boosting and random forest approaches to improve the discrimination of true hits from false positive identifications in the results of mass spectrometry database search engines compared with thresholding and other machine learning approaches. We accommodated additional attributes obtainable from database search results, including a factor addressing proton mobility. Performance was evaluated using publically available electrospray data and a new collection of MALDI data generated from purified human reference proteins.

摘要

质谱数据的人工分析是高通量蛋白质组学当前的一个瓶颈。特别是，手动验证质谱数据库搜索算法的结果可能会非常耗时。开发试图量化质谱图中蛋白质或肽段身份分配置信度的软件工具是一个备受关注的活跃领域。我们试图通过研究近期机器学习算法的潜力来扩展该领域的工作，以提高这些方法的准确性，并作为一个灵活的框架来适应新的数据特征。具体而言，我们证明了与阈值化和其他机器学习方法相比，提升算法和随机森林方法能够提高质谱数据库搜索引擎结果中真阳性识别与假阳性识别之间的区分度。我们纳入了可从数据库搜索结果中获得的其他属性，包括一个涉及质子迁移率的因子。使用公开可用的电喷雾数据和从纯化的人类参考蛋白质生成的新的基质辅助激光解吸电离（MALDI）数据集合对性能进行了评估。

相似文献

Improved classification of mass spectrometry database search results using newer machine learning approaches.

Mol Cell Proteomics. 2006 Mar;5(3):497-509. doi: 10.1074/mcp.M500233-MCP200. Epub 2005 Nov 30.

Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering.

Proteomics. 2006 Apr;6(7):2086-94. doi: 10.1002/pmic.200500309.

Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics.

BMC Bioinformatics. 2008 Oct 20;9:443. doi: 10.1186/1471-2105-9-443.

CHOMPER: a bioinformatic tool for rapid validation of tandem mass spectrometry search results associated with high-throughput proteomic strategies.

Proteomics. 2002 Sep;2(9):1097-103. doi: 10.1002/1615-9861(200209)2:9<1097::AID-PROT1097>3.0.CO;2-X.

Feature selection and nearest centroid classification for protein mass spectrometry.

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

Support vector machines for improved peptide identification from tandem mass spectrometry database search.

Methods Mol Biol. 2009;492:453-60. doi: 10.1007/978-1-59745-493-3_28.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

Sequit: software for de novo peptide sequencing by matrix-assisted laser desorption/ionization post-source decay mass spectrometry.

Rapid Commun Mass Spectrom. 2004;18(8):907-13. doi: 10.1002/rcm.1420.

Peptide reranking with protein-peptide correspondence and precursor peak intensity information.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Jul-Aug;9(4):1212-9. doi: 10.1109/TCBB.2012.29.

Technical innovations for the automated identification of gel-separated proteins by MALDI-TOF mass spectrometry.

Anal Bioanal Chem. 2006 Sep;386(1):92-103. doi: 10.1007/s00216-006-0592-1. Epub 2006 Jul 5.

引用本文的文献

Array-Based Machine Learning for Functional Group Detection in Electron Ionization Mass Spectrometry.

ACS Omega. 2023 Jun 29;8(27):24341-24350. doi: 10.1021/acsomega.3c01684. eCollection 2023 Jul 11.

Mitigating Cold Start Problem in Serverless Computing with Function Fusion.

Sensors (Basel). 2021 Dec 16;21(24):8416. doi: 10.3390/s21248416.

Deep learning for peptide identification from metaproteomics datasets.

J Proteomics. 2021 Sep 15;247:104316. doi: 10.1016/j.jprot.2021.104316. Epub 2021 Jul 8.

Novel Human miRNA-Disease Association Inference Based on Random Forest.

Mol Ther Nucleic Acids. 2018 Dec 7;13:568-579. doi: 10.1016/j.omtn.2018.10.005. Epub 2018 Oct 11.

Performance Investigation of Proteomic Identification by HCD/CID Fragmentations in Combination with High/Low-Resolution Detectors on a Tribrid, High-Field Orbitrap Instrument.

PLoS One. 2016 Jul 29;11(7):e0160160. doi: 10.1371/journal.pone.0160160. eCollection 2016.

Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

J Proteome Res. 2015 Nov 6;14(11):4662-73. doi: 10.1021/acs.jproteome.5b00536. Epub 2015 Sep 30.

Learning from decoys to improve the sensitivity and specificity of proteomics database search results.

PLoS One. 2012;7(11):e50651. doi: 10.1371/journal.pone.0050651. Epub 2012 Nov 26.

An improved machine learning protocol for the identification of correct Sequest search results.

BMC Bioinformatics. 2010 Dec 7;11:591. doi: 10.1186/1471-2105-11-591.

A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics.

J Proteomics. 2010 Oct 10;73(11):2092-123. doi: 10.1016/j.jprot.2010.08.009. Epub 2010 Sep 8.

Toward digital staining using imaging mass spectrometry and random forests.

J Proteome Res. 2009 Jul;8(7):3558-67. doi: 10.1021/pr900253y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用更新的机器学习方法改进质谱数据库搜索结果的分类。

Improved classification of mass spectrometry database search results using newer machine learning approaches.

作者信息

Ulintz Peter J, Zhu Ji, Qin Zhaohui S, Andrews Philip C

机构信息

National Resource for Proteomics and Pathways, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109, USA.

出版信息

Mol Cell Proteomics. 2006 Mar;5(3):497-509. doi: 10.1074/mcp.M500233-MCP200. Epub 2005 Nov 30.

DOI:10.1074/mcp.M500233-MCP200

PMID:16321970

Abstract

摘要

使用更新的机器学习方法改进质谱数据库搜索结果的分类。

Improved classification of mass spectrometry database search results using newer machine learning approaches.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用更新的机器学习方法改进质谱数据库搜索结果的分类。

Improved classification of mass spectrometry database search results using newer machine learning approaches.

作者信息

机构信息

出版信息

相似文献

引用本文的文献