OLGA：快速计算 B 细胞和 T 细胞受体氨基酸序列和基序的生成概率。

OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs.

机构信息

Joseph Henry Laboratories, Princeton University, Princeton, NJ, USA.

Laboratoire de physique de l'Ecole normale supérieure (PSL University), Centre national de la recherche scientifique, Sorbonne University, University Paris-Diderot, Paris, France.

出版信息

Bioinformatics. 2019 Sep 1;35(17):2974-2981. doi: 10.1093/bioinformatics/btz035.

DOI:10.1093/bioinformatics/btz035

PMID:30657870

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6735909/

Abstract

MOTIVATION

High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 orders of magnitude in real repertoires. Since the function of a receptor really depends on its protein sequence, it is important to be able to predict this probability of generation at the amino acid level. However, brute-force summation over all the nucleotide sequences with the correct amino acid translation is computationally intractable. The purpose of this paper is to present a solution to this problem.

RESULTS

We use dynamic programming to construct an efficient and flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 amino acid sequence or motif, with or without V/J restriction, as a result of V(D)J recombination in B or T cells. We apply it to databases of epitope-specific T-cell receptors to evaluate the probability that a typical human subject will possess T cells responsive to specific disease-associated epitopes. The model prediction shows an excellent agreement with published data. We suggest that OLGA may be a useful tool to guide vaccine design.

AVAILABILITY AND IMPLEMENTATION

Source code is available at https://github.com/zsethna/OLGA.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量测序大型免疫受体库使开发方法成为可能，这些方法可以预测 T 细胞和 B 细胞受体的 V(D)J 重组产生任何特定核苷酸序列的概率。这些生成概率非常不均匀，在实际的受体库中，范围跨越 20 个数量级。由于受体的功能实际上取决于其蛋白质序列，因此能够预测其在氨基酸水平上的生成概率非常重要。然而，通过所有具有正确氨基酸翻译的核苷酸序列进行暴力求和在计算上是不可行的。本文的目的是提出解决此问题的方法。

结果

我们使用动态规划来构建一种高效灵活的算法，称为 OLGA（免疫球蛋白氨基酸序列的优化似然估计），用于计算在 B 或 T 细胞中 V(D)J 重组产生给定 CDR3 氨基酸序列或基序的概率，无论是否存在 V/J 限制。我们将其应用于表位特异性 T 细胞受体数据库，以评估特定疾病相关表位的人类受试者是否会产生 T 细胞反应的可能性。模型预测与已发表的数据非常吻合。我们建议 OLGA 可能是指导疫苗设计的有用工具。

可用性和实现

源代码可在 https://github.com/zsethna/OLGA 上获得。

补充信息

补充数据可在“Bioinformatics”在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9acb/6735909/fec6f545805b/btz035f1.jpg

相似文献

OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs.

Bioinformatics. 2019 Sep 1;35(17):2974-2981. doi: 10.1093/bioinformatics/btz035.

repgenHMM: a dynamic programming tool to infer the rules of immune receptor generation from sequence data.

Bioinformatics. 2016 Jul 1;32(13):1943-51. doi: 10.1093/bioinformatics/btw112. Epub 2016 Feb 26.

SOS: online probability estimation and generation of T-and B-cell receptors.

Bioinformatics. 2020 Aug 15;36(16):4510-4512. doi: 10.1093/bioinformatics/btaa574.

IMGT(®) tools for the nucleotide analysis of immunoglobulin (IG) and T cell receptor (TR) V-(D)-J repertoires, polymorphisms, and IG mutations: IMGT/V-QUEST and IMGT/HighV-QUEST for NGS.

Methods Mol Biol. 2012;882:569-604. doi: 10.1007/978-1-61779-842-9_32.

Hammock: a hidden Markov model-based peptide clustering algorithm to identify protein-interaction consensus motifs in large datasets.

Bioinformatics. 2016 Jan 1;32(1):9-16. doi: 10.1093/bioinformatics/btv522. Epub 2015 Sep 5.

ASAP - A Webserver for Immunoglobulin-Sequencing Analysis Pipeline.

Front Immunol. 2018 Jul 30;9:1686. doi: 10.3389/fimmu.2018.01686. eCollection 2018.

Statistical inference of the generation probability of T-cell receptors from sequence repertoires.

Proc Natl Acad Sci U S A. 2012 Oct 2;109(40):16161-6. doi: 10.1073/pnas.1212755109. Epub 2012 Sep 17.

TRIg: a robust alignment pipeline for non-regular T-cell receptor and immunoglobulin sequences.

BMC Bioinformatics. 2016 Oct 26;17(1):433. doi: 10.1186/s12859-016-1304-2.

ImmunExplorer (IMEX): a software framework for diversity and clonality analyses of immunoglobulins and T cell receptors on the basis of IMGT/HighV-QUEST preprocessed NGS data.

BMC Bioinformatics. 2015 Aug 12;16:252. doi: 10.1186/s12859-015-0687-9.

Statistical inference reveals the role of length, GC content, and local sequence in V(D)J nucleotide trimming.

Elife. 2023 May 25;12:e85145. doi: 10.7554/eLife.85145.

引用本文的文献

Humoral immunity after hematopoietic stem cell transplantation: evaluation by B-cell receptor repertoire analysis.

Int J Hematol. 2025 Jul 25. doi: 10.1007/s12185-025-04042-9.

Divergent B-cell repertoire remodelling by mRNA, DNA and live attenuated vaccines in fish.

NPJ Vaccines. 2025 Jul 24;10(1):166. doi: 10.1038/s41541-025-01232-8.

Mapping T cell infiltration patterns in glioma tumor tissue.

medRxiv. 2025 Jun 26:2025.06.25.25330286. doi: 10.1101/2025.06.25.25330286.

Enhancing sequence alignment of adaptive immune receptors through multi-task deep learning.

Nucleic Acids Res. 2025 Jul 8;53(13). doi: 10.1093/nar/gkaf651.

An amphiregulin reporter mouse enables transcriptional and clonal expansion analysis of reparative lung Tregs.

JCI Insight. 2025 Jul 8;10(13). doi: 10.1172/jci.insight.187245.

TRAP: a contrastive learning-enhanced framework for robust TCR-pMHC binding prediction with improved generalizability.

Chem Sci. 2025 Apr 29. doi: 10.1039/d4sc08141b.

Statistical analysis of repertoire data demonstrates the influence of microhomology in V(D)J recombination.

Nucleic Acids Res. 2025 Mar 20;53(6). doi: 10.1093/nar/gkaf250.

Simulation of adaptive immune receptors and repertoires with complex immune information to guide the development and benchmarking of AIRR machine learning.

Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkaf025.

T cell receptor-centric perspective to multimodal single-cell data analysis.

Sci Adv. 2024 Nov 29;10(48):eadr3196. doi: 10.1126/sciadv.adr3196.

An unbiased comparison of immunoglobulin sequence aligners.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae556.

本文引用的文献

Signatures of selection in the human antibody repertoire: Selective sweeps, competing subclones, and neutral drift.

Proc Natl Acad Sci U S A. 2019 Jan 22;116(4):1261-1266. doi: 10.1073/pnas.1814213116. Epub 2019 Jan 8.

Precise tracking of vaccine-responding T cell clones reveals convergent and personalized response in identical twins.

Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12704-12709. doi: 10.1073/pnas.1809642115. Epub 2018 Nov 20.

Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity.

Elife. 2018 Aug 28;7:e38358. doi: 10.7554/eLife.38358.

Evidence for Shaping of Light Chain Repertoire by Structural Selection.

Front Immunol. 2018 Jun 22;9:1307. doi: 10.3389/fimmu.2018.01307. eCollection 2018.

Expanded TCRβ CDR3 clonotypes distinguish Crohn's disease and ulcerative colitis patients.

Mucosal Immunol. 2018 Sep;11(5):1487-1495. doi: 10.1038/s41385-018-0046-z. Epub 2018 Jul 9.

Predicting the spectrum of TCR repertoire sharing with a data-driven model of recombination.

Immunol Rev. 2018 Jul;284(1):167-179. doi: 10.1111/imr.12665.

Method for identification of condition-associated public antigen receptor sequences.

Elife. 2018 Mar 13;7:e33050. doi: 10.7554/eLife.33050.

CD8+ T cells with characteristic T cell receptor beta motif are detected in blood and expanded in synovial fluid of ankylosing spondylitis patients.

Rheumatology (Oxford). 2018 Jun 1;57(6):1097-1104. doi: 10.1093/rheumatology/kex517.

High-throughput immune repertoire analysis with IGoR.

Nat Commun. 2018 Feb 8;9(1):561. doi: 10.1038/s41467-018-02832-w.

VDJdb: a curated database of T-cell receptor sequences with known antigen specificity.

Nucleic Acids Res. 2018 Jan 4;46(D1):D419-D427. doi: 10.1093/nar/gkx760.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

OLGA：快速计算 B 细胞和 T 细胞受体氨基酸序列和基序的生成概率。

OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs.

机构信息

Joseph Henry Laboratories, Princeton University, Princeton, NJ, USA.

Laboratoire de physique de l'Ecole normale supérieure (PSL University), Centre national de la recherche scientifique, Sorbonne University, University Paris-Diderot, Paris, France.