PEPMatch：一种在大量蛋白质中识别短肽序列匹配的工具。

PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins.

机构信息

Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, San Diego, CA, USA.

University of California San Diego School of Medicine, La Jolla, San Diego, CA, USA.

出版信息

BMC Bioinformatics. 2023 Dec 18;24(1):485. doi: 10.1186/s12859-023-05606-4.

DOI:10.1186/s12859-023-05606-4

PMID:38110863

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10726511/

Abstract

BACKGROUND

Numerous tools exist for biological sequence comparisons and search. One case of particular interest for immunologists is finding matches for linear peptide T cell epitopes, typically between 8 and 15 residues in length, in a large set of protein sequences. Both to find exact matches or matches that account for residue substitutions. The utility of such tools is critical in applications ranging from identifying conservation across viral epitopes, identifying putative epitope targets for allergens, and finding matches for cancer-associated neoepitopes to examine the role of tolerance in tumor recognition.

RESULTS

We defined a set of benchmarks that reflect the different practical applications of short peptide sequence matching. We evaluated a suite of existing methods for speed and recall and developed a new tool, PEPMatch. The tool uses a deterministic k-mer mapping algorithm that preprocesses proteomes before searching, achieving a 50-fold increase in speed over methods such as the Basic Local Alignment Search Tool (BLAST) without compromising recall. PEPMatch's code and benchmark datasets are publicly available.

CONCLUSIONS

PEPMatch offers significant speed and recall advantages for peptide sequence matching. While it is of immediate utility for immunologists, the developed benchmarking framework also provides a standard against which future tools can be evaluated for improvements. The tool is available at https://nextgen-tools.iedb.org , and the source code can be found at https://github.com/IEDB/PEPMatch .

摘要

背景

有许多用于生物序列比较和搜索的工具。对于免疫学家来说，特别感兴趣的一个案例是在一大组蛋白质序列中找到线性肽 T 细胞表位（通常长度为 8 到 15 个残基）的匹配，无论是精确匹配还是残基替换的匹配。这些工具的实用性在各种应用中都至关重要，包括识别病毒表位的保守性、鉴定过敏原的潜在表位靶标，以及寻找与癌症相关的新表位的匹配，以研究耐受在肿瘤识别中的作用。

结果

我们定义了一组反映短肽序列匹配不同实际应用的基准。我们评估了一系列现有的方法的速度和召回率，并开发了一个新工具，PEPMatch。该工具使用确定性 k-mer 映射算法在搜索前预处理蛋白质组，与 Basic Local Alignment Search Tool（BLAST）等方法相比，速度提高了 50 倍，而不会影响召回率。PEPMatch 的代码和基准数据集可公开获取。

结论

PEPMatch 为肽序列匹配提供了显著的速度和召回优势。虽然它对免疫学家具有直接的实用性，但开发的基准框架还为未来的工具提供了一个评估改进的标准。该工具可在 https://nextgen-tools.iedb.org 获得，其源代码可在 https://github.com/IEDB/PEPMatch 找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a486/10726511/37370f3ef19f/12859_2023_5606_Fig1_HTML.jpg

相似文献

PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins.PEPMatch：一种在大量蛋白质中识别短肽序列匹配的工具。

BMC Bioinformatics. 2023 Dec 18;24(1):485. doi: 10.1186/s12859-023-05606-4.

Next-generation IEDB tools: a platform for epitope prediction and analysis.下一代 IEDB 工具：一个用于表位预测和分析的平台。

Nucleic Acids Res. 2024 Jul 5;52(W1):W526-W532. doi: 10.1093/nar/gkae407.

Pep-3D-Search: a method for B-cell epitope prediction based on mimotope analysis.Pep-3D-Search：一种基于模拟表位分析的B细胞表位预测方法。

BMC Bioinformatics. 2008 Dec 16;9:538. doi: 10.1186/1471-2105-9-538.

Benchmarking predictions of MHC class I restricted T cell epitopes in a comprehensively studied model system.在一个经过全面研究的模型系统中对 MHC Ⅰ类限制性 T 细胞表位的预测进行基准测试。

PLoS Comput Biol. 2020 May 26;16(5):e1007757. doi: 10.1371/journal.pcbi.1007757. eCollection 2020 May.

Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets.吊床：一种基于隐马尔可夫模型的肽聚类算法，用于在大型数据集中识别蛋白质相互作用共有基序。

Bioinformatics. 2016 Jan 1;32(1):9-16. doi: 10.1093/bioinformatics/btv522. Epub 2015 Sep 5.

CAVES: A Novel Tool for Comparative Analysis of Variant Epitope Sequences.CAVES：一种用于变异表位序列比较分析的新工具。

Viruses. 2022 May 26;14(6):1152. doi: 10.3390/v14061152.

MMseqs software suite for fast and deep clustering and searching of large protein sequence sets.MMseqs软件套件，用于对大型蛋白质序列集进行快速且深入的聚类和搜索。

Bioinformatics. 2016 May 1;32(9):1323-30. doi: 10.1093/bioinformatics/btw006. Epub 2016 Jan 6.

GuiTope: an application for mapping random-sequence peptides to protein sequences.GuiTope：一种将随机序列肽映射到蛋白质序列的应用程序。

BMC Bioinformatics. 2012 Jan 3;13:1. doi: 10.1186/1471-2105-13-1.

Population-level distribution and putative immunogenicity of cancer neoepitopes.人群水平的癌症新生抗原分布和推测的免疫原性。

BMC Cancer. 2018 Apr 13;18(1):414. doi: 10.1186/s12885-018-4325-6.

Automated benchmarking of peptide-MHC class I binding predictions.肽与主要组织相容性复合体I类结合预测的自动化基准测试。

Bioinformatics. 2015 Jul 1;31(13):2174-81. doi: 10.1093/bioinformatics/btv123. Epub 2015 Feb 25.

引用本文的文献

Distinct Omicron longitudinal memory T cell profile and T cell receptor repertoire associated with COVID-19 hospitalisation.与新冠住院相关的独特奥密克戎纵向记忆T细胞图谱和T细胞受体库

Front Immunol. 2025 Apr 2;16:1549570. doi: 10.3389/fimmu.2025.1549570. eCollection 2025.

A single point mutation on FLT3L-Fc protein increases the risk of immunogenicity.FLT3L-Fc蛋白上的一个单点突变会增加免疫原性风险。

Front Immunol. 2025 Feb 13;16:1519452. doi: 10.3389/fimmu.2025.1519452. eCollection 2025.

Nonclinical immunogenicity risk assessment for knobs-into-holes bispecific IgG antibodies.用于 knob-into-holes 双特异性 IgG 抗体的非临床免疫原性风险评估。

MAbs. 2024 Jan-Dec;16(1):2362789. doi: 10.1080/19420862.2024.2362789. Epub 2024 Jun 6.

Next-generation IEDB tools: a platform for epitope prediction and analysis.下一代 IEDB 工具：一个用于表位预测和分析的平台。

Nucleic Acids Res. 2024 Jul 5;52(W1):W526-W532. doi: 10.1093/nar/gkae407.

Variable domain mutational analysis to probe the molecular mechanisms of high viscosity of an IgG antibody.可变域突变分析探究 IgG 抗体高黏度的分子机制。

MAbs. 2024 Jan-Dec;16(1):2304282. doi: 10.1080/19420862.2024.2304282. Epub 2024 Jan 25.

本文引用的文献

Identification of cow milk epitopes to characterize and quantify disease-specific T cells in allergic children.鉴定牛奶表位以鉴定和量化过敏儿童中特异性疾病 T 细胞。

J Allergy Clin Immunol. 2023 Nov;152(5):1196-1209. doi: 10.1016/j.jaci.2023.07.020. Epub 2023 Aug 19.

UniProt: the Universal Protein Knowledgebase in 2023.UniProt：2023 年的通用蛋白质知识库。

Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.

The Cancer Epitope Database and Analysis Resource (CEDAR).癌症抗原数据库与分析资源（CEDAR）。

Nucleic Acids Res. 2023 Jan 6;51(D1):D845-D852. doi: 10.1093/nar/gkac902.

Targets of T Cell Responses to SARS-CoV-2 Coronavirus in Humans with COVID-19 Disease and Unexposed Individuals.COVID-19 疾病患者和未接触者体内针对 SARS-CoV-2 冠状病毒的 T 细胞反应的靶标。

Cell. 2020 Jun 25;181(7):1489-1501.e15. doi: 10.1016/j.cell.2020.05.015. Epub 2020 May 20.

A large peptidome dataset improves HLA class I epitope prediction across most of the human population.一个大型的肽组数据集提高了 HLA Ⅰ类抗原表位预测在大多数人群中的性能。

Nat Biotechnol. 2020 Feb;38(2):199-209. doi: 10.1038/s41587-019-0322-9. Epub 2019 Dec 16.

The Immune Epitope Database (IEDB): 2018 update.免疫表位数据库（IEDB）：2018 年更新。

Nucleic Acids Res. 2019 Jan 8;47(D1):D339-D343. doi: 10.1093/nar/gky1006.

MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.MMseqs2支持进行灵敏的蛋白质序列搜索，以分析海量数据集。

Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988. Epub 2017 Oct 16.

The Length Distribution of Class I-Restricted T Cell Epitopes Is Determined by Both Peptide Supply and MHC Allele-Specific Binding Preference.I类限制性T细胞表位的长度分布由肽供应和MHC等位基因特异性结合偏好共同决定。

J Immunol. 2016 Feb 15;196(4):1480-7. doi: 10.4049/jimmunol.1501721. Epub 2016 Jan 18.

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.美国国立生物技术信息中心的参考序列（RefSeq）数据库：当前状态、分类扩展及功能注释。

Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45. doi: 10.1093/nar/gkv1189. Epub 2015 Nov 8.

Food allergy: epidemiology and natural history.食物过敏：流行病学与自然史

Immunol Allergy Clin North Am. 2015 Feb;35(1):45-59. doi: 10.1016/j.iac.2014.09.004. Epub 2014 Nov 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PEPMatch：一种在大量蛋白质中识别短肽序列匹配的工具。

PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献