MUMAL：基于机器学习技术的 shotgun 蛋白质组学多元分析。

MUMAL: multivariate analysis in shotgun proteomics using machine learning techniques.

机构信息

Department of Informatics, Federal University of Viçosa, 36570-000 Minas Geras, Brazil.

出版信息

BMC Genomics. 2012;13 Suppl 5(Suppl 5):S4. doi: 10.1186/1471-2164-13-S5-S4. Epub 2012 Oct 19.

DOI:10.1186/1471-2164-13-S5-S4

PMID:23095859

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3477001/

Abstract

BACKGROUND

The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry) is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs) needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity.

RESULTS

Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches.

CONCLUSION

Our approach not only enhances the computational performance, and thus the turn around time of MS-based experiments in proteomics, but also improves the information content with benefits of a higher proteome coverage. This improvement, for instance, increases the chance to identify important drug targets or biomarkers for drug development or molecular diagnostics.

摘要

背景

shotgun 策略（液相色谱与串联质谱联用）广泛应用于复杂混合物中蛋白质的鉴定。该方法在单次运行中产生数千个谱图，由计算工具进行解释。这些工具通常使用蛋白质数据库，从数据库中提取肽序列与实验衍生的质谱数据进行匹配。数据库搜索后，还需要通过算法评估获得的肽谱匹配（PSM）的正确性，因为对这些庞大数据集进行人工审核是不切实际的。靶标-诱饵数据库策略主要用于进行谱评估。然而，这种方法在应用时没有考虑到灵敏度，即仅考虑错误估计。最近提出的 MUDE 方法将靶标-诱饵分析视为优化问题，其中灵敏度最大化。该方法在固定错误率下显著增加了检索到的 PSM 数量。然而，MUDE 模型的构建方式是建立线性决策边界来区分正确和错误的 PSM。此外，为了解决优化问题，所描述的启发式方法需要多次执行才能显著提高灵敏度。

结果

在这里，我们提出了一种新的 PSM 评估方法，称为 MUMAL，该方法基于机器学习技术。我们的方法可以建立非线性决策边界，从而提高检索更多真阳性的机会。此外，我们需要很少的迭代就能达到很高的灵敏度，显著缩短整个过程的运行时间。实验表明，与 MUDE、PeptideProphet 和典型的靶标-诱饵方法等标准工具相比，我们的方法可以获得更多的 PSM。

结论

我们的方法不仅提高了计算性能，从而缩短了基于 MS 的蛋白质组学实验的周转时间，而且还通过提高蛋白质组覆盖率来提高信息含量。这种改进，例如，增加了识别重要药物靶点或生物标志物的机会，用于药物开发或分子诊断。

相似文献

MUMAL: multivariate analysis in shotgun proteomics using machine learning techniques.

BMC Genomics. 2012;13 Suppl 5(Suppl 5):S4. doi: 10.1186/1471-2164-13-S5-S4. Epub 2012 Oct 19.

MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm.

BMC Bioinformatics. 2016 Dec 15;17(Suppl 18):472. doi: 10.1186/s12859-016-1341-x.

MUDE: a new approach for optimizing sensitivity in the target-decoy search strategy for large-scale peptide/protein identification.

J Proteome Res. 2010 May 7;9(5):2265-77. doi: 10.1021/pr901023v.

Decoy methods for assessing false positives and false discovery rates in shotgun proteomics.

Anal Chem. 2009 Jan 1;81(1):146-59. doi: 10.1021/ac801664q.

Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets.

J Proteome Res. 2009 Jul;8(7):3737-45. doi: 10.1021/pr801109k.

Two-dimensional target decoy strategy for shotgun proteomics.

J Proteome Res. 2011 Dec 2;10(12):5296-301. doi: 10.1021/pr200780j. Epub 2011 Nov 7.

A cost-sensitive online learning method for peptide identification.

BMC Genomics. 2020 Apr 25;21(1):324. doi: 10.1186/s12864-020-6693-y.

A peptide-retrieval strategy enables significant improvement of quantitative performance without compromising confidence of identification.

J Proteomics. 2017 Jan 30;152:276-282. doi: 10.1016/j.jprot.2016.11.020. Epub 2016 Nov 27.

New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i745-i753. doi: 10.1093/bioinformatics/btaa807.

AttnPep: A Self-Attention-Based Deep Learning Method for Peptide Identification in Shotgun Proteomics.

J Proteome Res. 2024 Feb 2;23(2):834-843. doi: 10.1021/acs.jproteome.3c00729. Epub 2024 Jan 22.

引用本文的文献

OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques.

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa067.

MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm.

BMC Bioinformatics. 2016 Dec 15;17(Suppl 18):472. doi: 10.1186/s12859-016-1341-x.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Prediction and diagnosis of bladder cancer recurrence based on urinary content of hTERT, SENP1, PPP1CA, and MCM5 transcripts.

BMC Cancer. 2010 Nov 24;10:646. doi: 10.1186/1471-2407-10-646.

SENP1 induces prostatic intraepithelial neoplasia through multiple mechanisms.

J Biol Chem. 2010 Aug 13;285(33):25859-66. doi: 10.1074/jbc.M110.134874. Epub 2010 Jun 15.

QIKS--Quantitative identification of kinase substrates.

Proteomics. 2010 May;10(10):2015-25. doi: 10.1002/pmic.200900749.

MUDE: a new approach for optimizing sensitivity in the target-decoy search strategy for large-scale peptide/protein identification.

J Proteome Res. 2010 May 7;9(5):2265-77. doi: 10.1021/pr901023v.

Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis.

Sci Signal. 2010 Jan 12;3(104):ra3. doi: 10.1126/scisignal.2000475.

Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 standard MS/MS data sets.

J Proteome Res. 2009 Apr;8(4):1782-91. doi: 10.1021/pr800792z.

SeMoP: a new computational strategy for the unrestricted search for modified peptides using LC-MS/MS data.

J Proteome Res. 2008 Sep;7(9):4199-208. doi: 10.1021/pr800277y. Epub 2008 Aug 8.

Automatic validation of phosphopeptide identifications by the MS2/MS3 target-decoy search strategy.

J Proteome Res. 2008 Apr;7(4):1640-9. doi: 10.1021/pr700675j. Epub 2008 Mar 4.

Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics.

BMC Bioinformatics. 2007 Nov 30;8:468. doi: 10.1186/1471-2105-8-468.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MUMAL：基于机器学习技术的 shotgun 蛋白质组学多元分析。

MUMAL: multivariate analysis in shotgun proteomics using machine learning techniques.

机构信息

Department of Informatics, Federal University of Viçosa, 36570-000 Minas Geras, Brazil.

出版信息

BMC Genomics. 2012;13 Suppl 5(Suppl 5):S4. doi: 10.1186/1471-2164-13-S5-S4. Epub 2012 Oct 19.

DOI:10.1186/1471-2164-13-S5-S4

PMID:23095859

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3477001/

Abstract

BACKGROUND

RESULTS

CONCLUSION

摘要

MUMAL：基于机器学习技术的 shotgun 蛋白质组学多元分析。

MUMAL: multivariate analysis in shotgun proteomics using machine learning techniques.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

MUMAL：基于机器学习技术的 shotgun 蛋白质组学多元分析。

MUMAL: multivariate analysis in shotgun proteomics using machine learning techniques.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献