以截留序列法作为标准来评估蛋白质组学数据分析过程的关键步骤。

Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process.

作者信息

Feng Xiao-Dong, Li Li-Wei, Zhang Jian-Hong, Zhu Yun-Ping, Chang Cheng, Shu Kun-Xian, Ma Jie

机构信息

Chongqing University of Posts and Telecommunications, 2 Chong Wen Road of Nan'an District, Chongqing, 400065, China.

Department of Bioinformatics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Engineering Research Center for Protein Drugs, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 38 Life Science Park Road, Beijing, 102206, China.

出版信息

BMC Genomics. 2017 Mar 14;18(Suppl 2):143. doi: 10.1186/s12864-017-3491-2.

DOI:10.1186/s12864-017-3491-2

PMID:28361671

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5374549/

Abstract

BACKGROUND

The mass spectrometry based technical pipeline has provided a high-throughput, high-sensitivity and high-resolution platform for post-genomic biology. Varied models and algorithms are implemented by different tools to improve proteomics data analysis. The target-decoy searching strategy has become the most popular strategy to control false identification in peptide and protein identifications. While this strategy can estimate the false discovery rate (FDR) within a dataset, it cannot directly evaluate the false positive matches in target identifications.

RESULTS

As a supplement to target-decoy strategy, the entrapment sequence method was introduced to assess the key steps of mass spectrometry data analysis process, database search engines and quality control methods. Using the entrapment sequences as the standard, we evaluated five database search engines for both the origanal scores and reprocessed scores, as well as four quality control methods in term of quantity and quality aspects. Our results showed that the latest developed search engine MS-GF+ and percolator-embeded quality control method PepDistiller performed best in all tools respectively. Combined with efficient quality control methods, the search engines can improve the low sensitivity of their original scores. Moreover, based on the entrapment sequence method, we proved that filtering the identifications separately could increase the number of identified peptides while improving the confidence level.

CONCLUSION

In this study, we have proved that the entrapment sequence method could be an useful strategy to assess the key steps of the mass spectrometry data analysis process. Its applications can be extended to all steps of the common workflow, such as the protein assembling methods and data integration methods.

摘要

背景

基于质谱的技术流程为后基因组生物学提供了一个高通量、高灵敏度和高分辨率的平台。不同的工具实现了各种模型和算法来改进蛋白质组学数据分析。目标-诱饵搜索策略已成为控制肽段和蛋白质鉴定中错误识别的最流行策略。虽然该策略可以估计数据集中的错误发现率（FDR），但它不能直接评估目标鉴定中的假阳性匹配。

结果

作为目标-诱饵策略的补充，引入了截留序列方法来评估质谱数据分析过程、数据库搜索引擎和质量控制方法的关键步骤。以截留序列为标准，我们评估了五个数据库搜索引擎的原始分数和重新处理后的分数，以及四种质量控制方法在数量和质量方面的表现。我们的结果表明，最新开发的搜索引擎MS-GF+和嵌入percolator的质量控制方法PepDistiller在所有工具中分别表现最佳。结合高效的质量控制方法，搜索引擎可以提高其原始分数的低灵敏度。此外，基于截留序列方法，我们证明了分别过滤鉴定结果可以增加鉴定出的肽段数量，同时提高置信度。

结论

在本研究中，我们证明了截留序列方法可能是评估质谱数据分析过程关键步骤的一种有用策略。其应用可以扩展到常见工作流程的所有步骤，如蛋白质组装方法和数据整合方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a55/5374549/71104410d5d9/12864_2017_3491_Fig1_HTML.jpg

相似文献

Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process.以截留序列法作为标准来评估蛋白质组学数据分析过程的关键步骤。

BMC Genomics. 2017 Mar 14;18(Suppl 2):143. doi: 10.1186/s12864-017-3491-2.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.优化搜索引擎和后处理方法以最大化高分辨率质谱数据的肽段和蛋白质鉴定

J Proteome Res. 2015 Nov 6;14(11):4662-73. doi: 10.1021/acs.jproteome.5b00536. Epub 2015 Sep 30.

PepDistiller: A quality control tool to improve the sensitivity and accuracy of peptide identifications in shotgun proteomics.PepDistiller：一种用于提高 shotgun 蛋白质组学中肽鉴定灵敏度和准确性的质量控制工具。

Proteomics. 2012 Jun;12(11):1720-5. doi: 10.1002/pmic.201100167.

Analysis of the resolution limitations of peptide identification algorithms.分析肽鉴定算法的分辨率限制。

J Proteome Res. 2011 Dec 2;10(12):5555-61. doi: 10.1021/pr200913a. Epub 2011 Oct 26.

Quality assessments of peptide-spectrum matches in shotgun proteomics.肽谱匹配在鸟枪法蛋白质组学中的质量评估。

Proteomics. 2011 Mar;11(6):1086-93. doi: 10.1002/pmic.201000432. Epub 2011 Feb 7.

Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.使用多种搜索引擎的集成蛋白质组学流程用于蛋白质基因组学研究并控制蛋白质错误发现率

J Proteome Res. 2016 Nov 4;15(11):4082-4090. doi: 10.1021/acs.jproteome.6b00376. Epub 2016 Aug 30.

Quality control of single amino acid variations detected by tandem mass spectrometry.串联质谱法检测的单个氨基酸变异的质量控制。

J Proteomics. 2018 Sep 15;187:144-151. doi: 10.1016/j.jprot.2018.07.004. Epub 2018 Jul 23.

Search and decoy: the automatic identification of mass spectra.搜索与诱饵：质谱的自动识别

Methods Mol Biol. 2012;893:445-88. doi: 10.1007/978-1-61779-885-6_28.

Maximizing the sensitivity and reliability of peptide identification in large-scale proteomic experiments by harnessing multiple search engines.利用多个搜索引擎，最大限度地提高大规模蛋白质组学实验中肽鉴定的灵敏度和可靠性。

Proteomics. 2010 Mar;10(6):1172-89. doi: 10.1002/pmic.200900074.

引用本文的文献

Overcoming Preservation Challenges to Enable Single-Cell Proteomics of Fixed Cells and Tissue Samples with Retained Proteome Integrity.克服保存挑战，实现固定细胞和组织样本的单细胞蛋白质组学分析并保持蛋白质组完整性。

J Proteome Res. 2025 Jul 4;24(7):3666-3682. doi: 10.1021/acs.jproteome.5c00268. Epub 2025 Jun 19.

Classification of Collagens via Peptide Ambiguation, in a Paleoproteomic LC-MS/MS-Based Taxonomic Pipeline.基于古蛋白质组液相色谱-串联质谱的分类学流程中通过肽段歧义对胶原蛋白进行分类

J Proteome Res. 2025 Apr 4;24(4):1907-1925. doi: 10.1021/acs.jproteome.4c00962. Epub 2025 Mar 13.

Query Mix-Max Method for FDR Estimation Supported by Entrapment Queries.由截留查询支持的用于错误发现率（FDR）估计的查询混合最大化方法。

J Proteome Res. 2025 Mar 7;24(3):1135-1147. doi: 10.1021/acs.jproteome.4c00744. Epub 2025 Feb 5.

Comparative Analysis of Data-Driven Rescoring Platforms for Improved Peptide Identification in HeLa Digest Samples.用于改进HeLa消化样品中肽段鉴定的数据驱动重新评分平台的比较分析

Proteomics. 2025 Apr;25(7):e202400225. doi: 10.1002/pmic.202400225. Epub 2025 Feb 2.

Integrated View of Baseline Protein Expression in Human Tissues Using Public Data Independent Acquisition Data Sets.利用公共数据独立采集数据集对人体组织中基线蛋白表达的综合视图。

J Proteome Res. 2025 Feb 7;24(2):685-695. doi: 10.1021/acs.jproteome.4c00788. Epub 2025 Jan 7.

AlphaPept: a modern and open framework for MS-based proteomics.AlphaPept：基于 MS 的蛋白质组学的现代开放框架。

Nat Commun. 2024 Mar 9;15(1):2168. doi: 10.1038/s41467-024-46485-4.

MS Annika 2.0 Identifies Cross-Linked Peptides in MS2-MS3-Based Workflows at High Sensitivity and Specificity.MS Annika 2.0 在基于 MS2-MS3 的工作流程中以高灵敏度和特异性鉴定交联肽。

J Proteome Res. 2023 Sep 1;22(9):3009-3021. doi: 10.1021/acs.jproteome.3c00325. Epub 2023 Aug 11.

HyPep: An Open-Source Software for Identification and Discovery of Neuropeptides Using Sequence Homology Search.HyPep：一种使用序列同源搜索鉴定和发现神经肽的开源软件。

J Proteome Res. 2023 Feb 3;22(2):420-431. doi: 10.1021/acs.jproteome.2c00597. Epub 2023 Jan 25.

False discovery rate estimation using candidate peptides for each spectrum.使用每个谱图的候选肽进行错误发现率估计。

BMC Bioinformatics. 2022 Nov 1;23(1):454. doi: 10.1186/s12859-022-05002-4.

DIALib-QC an assessment tool for spectral libraries in data-independent acquisition proteomics.DIALib-QC：一种用于数据非依赖采集蛋白质组学中光谱库的评估工具。

Nat Commun. 2020 Oct 16;11(1):5251. doi: 10.1038/s41467-020-18901-y.

本文引用的文献

l2 Multiple Kernel Fuzzy SVM-Based Data Fusion for Improving Peptide Identification.基于多核模糊支持向量机的数据融合用于改进肽段鉴定

IEEE/ACM Trans Comput Biol Bioinform. 2016 Jul-Aug;13(4):804-9. doi: 10.1109/TCBB.2015.2480084. Epub 2015 Sep 18.

J Proteome Res. 2015 Nov 6;14(11):4662-73. doi: 10.1021/acs.jproteome.5b00536. Epub 2015 Sep 30.

MS-GF+ makes progress towards a universal database search tool for proteomics.MS-GF+朝着蛋白质组学通用数据库搜索工具的方向取得了进展。

Nat Commun. 2014 Oct 31;5:5277. doi: 10.1038/ncomms6277.

Crux: rapid open source protein tandem mass spectrometry analysis.关键：快速开源蛋白质串联质谱分析

J Proteome Res. 2014 Oct 3;13(10):4488-91. doi: 10.1021/pr500741y. Epub 2014 Sep 9.

MS Amanda, a universal identification algorithm optimized for high accuracy tandem mass spectra.阿曼达质谱（MS Amanda），一种针对高精度串联质谱进行优化的通用识别算法。

J Proteome Res. 2014 Aug 1;13(8):3679-84. doi: 10.1021/pr500202e. Epub 2014 Jun 26.

Fast and accurate database searches with MS-GF+Percolator.使用MS-GF+Percolator进行快速准确的数据库搜索。

J Proteome Res. 2014 Feb 7;13(2):890-7. doi: 10.1021/pr400937n. Epub 2013 Dec 23.

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.一种将肽的串联质谱数据与蛋白质数据库中氨基酸序列相关联的方法。

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

First proteomic exploration of protein-encoding genes on chromosome 1 in human liver, stomach, and colon.人类肝脏、胃和结肠 1 号染色体上蛋白质编码基因的首次蛋白质组学探索。

J Proteome Res. 2013 Jan 4;12(1):67-80. doi: 10.1021/pr3008286. Epub 2012 Dec 20.

Comet: an open-source MS/MS sequence database search tool.彗星：一个开源的 MS/MS 序列数据库搜索工具。

Proteomics. 2013 Jan;13(1):22-4. doi: 10.1002/pmic.201200439. Epub 2012 Dec 4.

Proteomics. 2012 Jun;12(11):1720-5. doi: 10.1002/pmic.201100167.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

以截留序列法作为标准来评估蛋白质组学数据分析过程的关键步骤。

Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献