一种用于模拟同源性以用于蛋白质推断算法特征分析的蛋白质标准。

A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms.

机构信息

Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health , KTH - Royal Institute of Technology , Box 1031 , 17121 Solna , Sweden.

European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus , Hinxton, Cambridge CB10 1SD , United Kingdom.

出版信息

J Proteome Res. 2018 May 4;17(5):1879-1886. doi: 10.1021/acs.jproteome.7b00899. Epub 2018 Apr 16.

DOI:10.1021/acs.jproteome.7b00899

PMID:29631402

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6474350/

Abstract

A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. A weakness of commercially available samples of known content is that they consist of proteins that are deliberately selected for producing tryptic peptides that are unique to a single protein. Unfortunately, such samples do not expose any complications in protein inference. Hence, for a realistic benchmark of protein inference procedures, there is a need for samples of known content where the present proteins share peptides with known absent proteins. Here, we present such a standard, that is based on E. coli expressed human protein fragments. To illustrate the application of this standard, we benchmark a set of different protein inference procedures on the data. We observe that inference procedures excluding shared peptides provide more accurate estimates of errors compared to methods that include information from shared peptides, while still giving a reasonable performance in terms of the number of identified proteins. We also demonstrate that using a sample of known protein content without proteins with shared tryptic peptides can give a false sense of accuracy for many protein inference methods.

摘要

一种评估分析实验设置性能的自然方法是使用已知成分的样品，并观察在多大程度上可以从数据中正确推断出样品的含量。对于鸟枪法蛋白质组学，解释数据的一个固有问题是，测量的分析物是肽，而不是实际的蛋白质本身。由于一些蛋白质具有共同的酶解肽，可能有多个可能的因果蛋白组导致给定的肽集，并且需要从检测到的肽列表中推断出蛋白质的机制。商业上可用的已知内容样本的一个弱点是，它们由故意选择产生独特于单个蛋白质的酶切肽的蛋白质组成。不幸的是，此类样品不会暴露蛋白质推断中的任何复杂情况。因此，对于蛋白质推断程序的实际基准测试，需要具有已知内容的样品，其中现有蛋白质与已知不存在的蛋白质共享肽。在这里，我们提出了这样一个标准，它基于表达人蛋白片段的大肠杆菌。为了说明该标准的应用，我们根据数据对一组不同的蛋白质推断程序进行了基准测试。我们观察到，排除共享肽的推断程序与包括共享肽信息的方法相比，提供了更准确的错误估计，同时在鉴定的蛋白质数量方面仍具有合理的性能。我们还证明，使用没有共享酶切肽的已知蛋白质含量的样品会使许多蛋白质推断方法产生错误的准确性感觉。

相似文献

A Protein Standard That Emulates Homology for the Characterization of Protein Inference Algorithms.一种用于模拟同源性以用于蛋白质推断算法特征分析的蛋白质标准。

J Proteome Res. 2018 May 4;17(5):1879-1886. doi: 10.1021/acs.jproteome.7b00899. Epub 2018 Apr 16.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

Protein Inference Using PIA Workflows and PSI Standard File Formats.使用 PIA 工作流程和 PSI 标准文件格式进行蛋白质推断。

J Proteome Res. 2019 Feb 1;18(2):741-747. doi: 10.1021/acs.jproteome.8b00723. Epub 2018 Dec 5.

A linear programming model for protein inference problem in shotgun proteomics.一种用于鸟枪法蛋白质组学中蛋白质推断问题的线性规划模型。

Bioinformatics. 2012 Nov 15;28(22):2956-62. doi: 10.1093/bioinformatics/bts540. Epub 2012 Sep 6.

IsoformResolver: A peptide-centric algorithm for protein inference.IsoformResolver：一种基于肽段的蛋白质推断算法。

J Proteome Res. 2011 Jul 1;10(7):3060-75. doi: 10.1021/pr200039p. Epub 2011 Jun 7.

Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra.通过串联质谱搜索蛋白质序列数据库鉴定肽段的手动评估综合方法。

J Proteome Res. 2005 May-Jun;4(3):998-1005. doi: 10.1021/pr049754t.

How to talk about protein-level false discovery rates in shotgun proteomics.如何在鸟枪法蛋白质组学中探讨蛋白质水平的错误发现率。

Proteomics. 2016 Sep;16(18):2461-9. doi: 10.1002/pmic.201500431.

A multidimensional liquid chromatography-tandem mass spectrometry platform to improve protein identification in high-throughput shotgun proteomics.一种用于在高通量鸟枪法蛋白质组学中提高蛋白质鉴定的多维液相色谱-串联质谱平台。

J Chromatogr A. 2017 May 19;1498:176-182. doi: 10.1016/j.chroma.2017.03.032. Epub 2017 Mar 18.

Advancement in protein inference from shotgun proteomics using peptide detectability.利用肽段可检测性从鸟枪法蛋白质组学进行蛋白质推断的进展。

Pac Symp Biocomput. 2007:409-20.

Combining De Novo Peptide Sequencing Algorithms, A Synergistic Approach to Boost Both Identifications and Confidence in Bottom-up Proteomics.结合从头肽序列算法，协同方法可提高下向蛋白质组学的鉴定数量和置信度。

J Proteome Res. 2017 Sep 1;16(9):3209-3218. doi: 10.1021/acs.jproteome.7b00198. Epub 2017 Aug 22.

引用本文的文献

Quality Control in the Mass Spectrometry Proteomics Core: A Practical Primer.质谱蛋白质组学核心实验室的质量控制：实用入门指南。

J Biomol Tech. 2024 Sep 12;35(3). doi: 10.7171/3fc1f5fe.42308a9a. eCollection 2024 Sep 30.

Classification of Collagens via Peptide Ambiguation, in a Paleoproteomic LC-MS/MS-Based Taxonomic Pipeline.基于古蛋白质组液相色谱-串联质谱的分类学流程中通过肽段歧义对胶原蛋白进行分类

J Proteome Res. 2025 Apr 4;24(4):1907-1925. doi: 10.1021/acs.jproteome.4c00962. Epub 2025 Mar 13.

Comprehensive Protein Inference Analysis with PyProteinInference Elucidates Biological Understanding of Tandem Mass Spectrometry Data.使用PyProteinInference进行全面的蛋白质推断分析可阐明对串联质谱数据的生物学理解。

J Proteome Res. 2025 Apr 4;24(4):2135-2140. doi: 10.1021/acs.jproteome.4c00734. Epub 2025 Feb 28.

Relative quantification of proteins and post-translational modifications in proteomic experiments with shared peptides: a weight-based approach.使用共享肽段的蛋白质组学实验中蛋白质和翻译后修饰的相对定量：一种基于权重的方法。

Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf046.

Observations from the Proteomics Bench.蛋白质组学实验台的观察结果。

Proteomes. 2024 Feb 6;12(1):6. doi: 10.3390/proteomes12010006.

ProInfer: An interpretable protein inference tool leveraging on biological networks.ProInfer：一种利用生物网络进行可解释蛋白质推断的工具。

PLoS Comput Biol. 2023 Mar 17;19(3):e1010961. doi: 10.1371/journal.pcbi.1010961. eCollection 2023 Mar.

Enhanced protein isoform characterization through long-read proteogenomics.通过长读蛋白质基因组学增强蛋白质亚型特征分析。

Genome Biol. 2022 Mar 3;23(1):69. doi: 10.1186/s13059-022-02624-y.

EPIFANY: A Method for Efficient High-Confidence Protein Inference.EPIFANY：一种高效高可信度蛋白质推断方法。

J Proteome Res. 2020 Mar 6;19(3):1060-1072. doi: 10.1021/acs.jproteome.9b00566. Epub 2020 Feb 13.

A Review of the Scientific Rigor, Reproducibility, and Transparency Studies Conducted by the ABRF Research Groups.ABRF 研究小组进行的科学严谨性、可重复性和透明度研究综述。

J Biomol Tech. 2020 Apr;31(1):11-26. doi: 10.7171/jbt.20-3101-003.

Beyond mass spectrometry, the next step in proteomics.超越质谱法，蛋白质组学的下一步。

Sci Adv. 2020 Jan 10;6(2):eaax8978. doi: 10.1126/sciadv.aax8978. eCollection 2020 Jan.

本文引用的文献

Progress on the HUPO Draft Human Proteome: 2017 Metrics of the Human Proteome Project.人类蛋白质组计划 HUPO 草案进展：2017 年人类蛋白质组项目指标。

J Proteome Res. 2017 Dec 1;16(12):4281-4287. doi: 10.1021/acs.jproteome.7b00375. Epub 2017 Oct 9.

Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0.使用 percolator 3.0 对大规模蛋白质组学数据集进行快速准确的蛋白质假发现率估计。

J Am Soc Mass Spectrom. 2016 Nov;27(11):1719-1727. doi: 10.1007/s13361-016-1460-7. Epub 2016 Aug 29.

How to talk about protein-level false discovery rates in shotgun proteomics.如何在鸟枪法蛋白质组学中探讨蛋白质水平的错误发现率。

Proteomics. 2016 Sep;16(18):2461-9. doi: 10.1002/pmic.201500431.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

Solution to Statistical Challenges in Proteomics Is More Statistics, Not Less.蛋白质组学中统计挑战的解决方案是更多的统计学方法，而非更少。

J Proteome Res. 2015 Oct 2;14(10):4099-103. doi: 10.1021/acs.jproteome.5b00568. Epub 2015 Aug 28.

A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets.一种用于大规模蛋白质组学数据集中蛋白质错误发现率估计的可扩展方法。

Mol Cell Proteomics. 2015 Sep;14(9):2394-404. doi: 10.1074/mcp.M114.046995. Epub 2015 May 17.

PIA: An Intuitive Protein Inference Engine with a Web-Based User Interface.PIA：一款具有基于网络用户界面的直观蛋白质推断引擎。

J Proteome Res. 2015 Jul 2;14(7):2988-97. doi: 10.1021/acs.jproteome.5b00121. Epub 2015 Jun 10.

Mass spectrometry-based protein identification with accurate statistical significance assignment.基于质谱的蛋白质鉴定及准确的统计显著性赋值。

Bioinformatics. 2015 Mar 1;31(5):699-706. doi: 10.1093/bioinformatics/btu717. Epub 2014 Oct 31.

Crux: rapid open source protein tandem mass spectrometry analysis.关键：快速开源蛋白质串联质谱分析

J Proteome Res. 2014 Oct 3;13(10):4488-91. doi: 10.1021/pr500741y. Epub 2014 Sep 9.

Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics.确定鸟枪法蛋白质组学中独特肽段的置信度估计程序的校准

J Proteomics. 2013 Mar 27;80:123-31. doi: 10.1016/j.jprot.2012.12.007. Epub 2012 Dec 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。