MaCPepDB：一个快速访问 UniProtKB 中所有胰蛋白酶肽的数据库。

MaCPepDB: A Database to Quickly Access All Tryptic Peptides of the UniProtKB.

机构信息

Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center, 44801 Bochum, Germany.

Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis, 44801 Bochum, Germany.

出版信息

J Proteome Res. 2021 Apr 2;20(4):2145-2150. doi: 10.1021/acs.jproteome.0c00967. Epub 2021 Mar 16.

DOI:10.1021/acs.jproteome.0c00967

PMID:33724838

Abstract

Protein sequence databases play a crucial role in the majority of the currently applied mass-spectrometry-based proteomics workflows. Here UniProtKB serves as one of the major sources, as it combines the information of several smaller databases and enriches the entries with additional biological information. For the identification of peptides in a sample by tandem mass spectra, as generated by data-dependent acquisition, protein sequence databases provide the basis for most spectrum identification search engines. In addition, for targeted proteomics approaches like selected reaction monitoring (SRM) and parallel reaction monitoring (PRM), knowledge of the peptide sequences, their masses, and whether they are unique for a protein is essential. Because most bottom-up proteomics approaches use trypsin to cleave the proteins in a sample, the tryptic peptides contained in a protein database are of great interest. We present a database, called MaCPepDB (mass-centric peptide database), that consists of the complete tryptic digest of the Swiss-Prot and TrEMBL parts of UniProtKB. This database is especially designed to not only allow queries of peptide sequences and return the respective information about connected proteins and thus whether a peptide is unique but also allow queries of specific masses of peptides or precursors of MS/MS spectra. Furthermore, posttranslational modifications can be considered in a query as well as different mass deviations for posttranslational modifications. Hence the database can be used by a sequence query not only to, for example, check in which proteins of the UniProt database a tryptic peptide can be found but also to find possibly interfering peptides in PRM/SRM experiments using the mass query. The complete database contains currently 5 939 244 990 peptides from 185 561 610 proteins (UniProt version 2020_03), for which a single query usually takes less than 1 s. For easy exploration of the data, a web interface was developed. A REST application programming interface (API) for programmatic and workflow access is also available at https://macpepdb.mpc.rub.de.

摘要

蛋白质序列数据库在目前大多数基于质谱的蛋白质组学工作流程中起着至关重要的作用。在这里，UniProtKB 是主要来源之一，它结合了几个较小数据库的信息，并通过附加的生物学信息丰富了条目内容。对于通过数据依赖型采集生成的串联质谱对样品中肽的鉴定，蛋白质序列数据库为大多数谱识别搜索引擎提供了基础。此外，对于靶向蛋白质组学方法，如选择反应监测 (SRM) 和并行反应监测 (PRM)，了解肽序列、它们的质量以及它们是否为蛋白质所特有是至关重要的。由于大多数自上而下的蛋白质组学方法使用胰蛋白酶来切割样品中的蛋白质，因此蛋白质数据库中包含的胰蛋白酶肽非常重要。我们提出了一个名为 MaCPepDB（质量中心肽数据库）的数据库，它由 UniProtKB 的 Swiss-Prot 和 TrEMBL 部分的完整胰蛋白酶消化物组成。这个数据库是专门设计的，不仅允许查询肽序列并返回与连接蛋白质相关的信息，从而确定肽是否独特，还允许查询特定质量的肽或 MS/MS 谱的前体。此外，还可以在查询中考虑翻译后修饰以及翻译后修饰的不同质量偏差。因此，该数据库不仅可以通过序列查询来检查 UniProt 数据库中的哪些蛋白质中可以找到胰蛋白酶肽，还可以通过质量查询在 PRM/SRM 实验中找到可能干扰的肽。完整的数据库目前包含 185561610 个蛋白质中的 5939244990 个肽（UniProt 版本 2020_03），单个查询通常不到 1 秒。为了方便探索数据，还开发了一个网络界面。还可以在 https://macpepdb.mpc.rub.de 上访问用于编程和工作流程访问的 REST 应用程序编程接口 (API)。

相似文献

MaCPepDB: A Database to Quickly Access All Tryptic Peptides of the UniProtKB.MaCPepDB：一个快速访问 UniProtKB 中所有胰蛋白酶肽的数据库。

J Proteome Res. 2021 Apr 2;20(4):2145-2150. doi: 10.1021/acs.jproteome.0c00967. Epub 2021 Mar 16.

Analysis of the tryptic search space in UniProt databases.对UniProt数据库中胰蛋白酶搜索空间的分析。

Proteomics. 2015 Jan;15(1):48-57. doi: 10.1002/pmic.201400227. Epub 2014 Dec 3.

PeptidePicker: a scientific workflow with web interface for selecting appropriate peptides for targeted proteomics experiments.肽段选择器：一种具有网页界面的科学工作流程，用于为靶向蛋白质组学实验选择合适的肽段。

J Proteomics. 2014 Jun 25;106:151-61. doi: 10.1016/j.jprot.2014.04.018. Epub 2014 Apr 22.

Comparative database search engine analysis on massive tandem mass spectra of pork-based food products for halal proteomics.基于猪肉的食品清真蛋白质组学大规模串联质谱的比较数据库搜索引擎分析

J Proteomics. 2021 Jun 15;241:104240. doi: 10.1016/j.jprot.2021.104240. Epub 2021 Apr 21.

UniProtKB/Swiss-Prot.通用蛋白质知识库/瑞士蛋白质数据库

Methods Mol Biol. 2007;406:89-112. doi: 10.1007/978-1-59745-535-0_4.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

Tailoring to Search Engines: Bottom-Up Proteomics with Collision Energies Optimized for Identification Confidence.针对搜索引擎的定制化：基于碰撞能优化的用于鉴定置信度的自下而上蛋白质组学。

J Proteome Res. 2021 Jan 1;20(1):474-484. doi: 10.1021/acs.jproteome.0c00518. Epub 2020 Dec 7.

The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program.通用蛋白质资源知识库/瑞士蛋白质数据库及其植物蛋白质组注释计划。

J Proteomics. 2009 Apr 13;72(3):567-73. doi: 10.1016/j.jprot.2008.11.010. Epub 2008 Nov 24.

Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics.有效利用靶向搜索空间以改善基于串联质谱的蛋白质组学中的肽段鉴定

J Proteome Res. 2015 Dec 4;14(12):5169-78. doi: 10.1021/acs.jproteome.5b00504. Epub 2015 Nov 24.

Confetti: a multiprotease map of the HeLa proteome for comprehensive proteomics.五彩纸屑：用于全面蛋白质组学的HeLa蛋白质组多蛋白酶图谱

Mol Cell Proteomics. 2014 Jun;13(6):1573-84. doi: 10.1074/mcp.M113.035170. Epub 2014 Apr 2.

引用本文的文献

Integrating multi-dimensional data to reveal the mechanisms and molecular targets of baikening granules for treatment of pediatric influenza.整合多维数据以揭示百咳宁颗粒治疗小儿流感的机制及分子靶点。

Front Mol Biosci. 2025 Jul 11;12:1637980. doi: 10.3389/fmolb.2025.1637980. eCollection 2025.

ProtGraph: a tool for the quick and comprehensive exploration and exploitation of the peptide search space derived from protein sequence databases using graphs.ProtGraph：一种利用图形对源自蛋白质序列数据库的肽搜索空间进行快速全面探索和利用的工具。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae671.

Changes in the Proteome of Platelets from Patients with Critical Progression of COVID-19.COVID-19 重症进展患者血小板蛋白质组的变化。

Cells. 2023 Sep 1;12(17):2191. doi: 10.3390/cells12172191.

Pectinases Secretion by : Optimization in Solid-State Fermentation and Identification by a Shotgun Proteomics Approach.果胶酶的固态发酵分泌优化及 shotgun 蛋白质组学鉴定。

Molecules. 2022 Aug 5;27(15):4981. doi: 10.3390/molecules27154981.

Network Pharmacology and Molecular Docking to Elucidate the Potential Mechanism of Ligusticum Chuanxiong Against Osteoarthritis.基于网络药理学和分子对接技术阐明川芎抗骨关节炎的潜在机制

Front Pharmacol. 2022 Apr 14;13:854215. doi: 10.3389/fphar.2022.854215. eCollection 2022.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MaCPepDB：一个快速访问 UniProtKB 中所有胰蛋白酶肽的数据库。

MaCPepDB: A Database to Quickly Access All Tryptic Peptides of the UniProtKB.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献