Joseph Henry Laboratories, Princeton University, Princeton, NJ, USA.
Laboratoire de physique de l'Ecole normale supérieure (PSL University), Centre national de la recherche scientifique, Sorbonne University, University Paris-Diderot, Paris, France.
Bioinformatics. 2019 Sep 1;35(17):2974-2981. doi: 10.1093/bioinformatics/btz035.
High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 orders of magnitude in real repertoires. Since the function of a receptor really depends on its protein sequence, it is important to be able to predict this probability of generation at the amino acid level. However, brute-force summation over all the nucleotide sequences with the correct amino acid translation is computationally intractable. The purpose of this paper is to present a solution to this problem.
We use dynamic programming to construct an efficient and flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 amino acid sequence or motif, with or without V/J restriction, as a result of V(D)J recombination in B or T cells. We apply it to databases of epitope-specific T-cell receptors to evaluate the probability that a typical human subject will possess T cells responsive to specific disease-associated epitopes. The model prediction shows an excellent agreement with published data. We suggest that OLGA may be a useful tool to guide vaccine design.
Source code is available at https://github.com/zsethna/OLGA.
Supplementary data are available at Bioinformatics online.
高通量测序大型免疫受体库使开发方法成为可能,这些方法可以预测 T 细胞和 B 细胞受体的 V(D)J 重组产生任何特定核苷酸序列的概率。这些生成概率非常不均匀,在实际的受体库中,范围跨越 20 个数量级。由于受体的功能实际上取决于其蛋白质序列,因此能够预测其在氨基酸水平上的生成概率非常重要。然而,通过所有具有正确氨基酸翻译的核苷酸序列进行暴力求和在计算上是不可行的。本文的目的是提出解决此问题的方法。
我们使用动态规划来构建一种高效灵活的算法,称为 OLGA(免疫球蛋白氨基酸序列的优化似然估计),用于计算在 B 或 T 细胞中 V(D)J 重组产生给定 CDR3 氨基酸序列或基序的概率,无论是否存在 V/J 限制。我们将其应用于表位特异性 T 细胞受体数据库,以评估特定疾病相关表位的人类受试者是否会产生 T 细胞反应的可能性。模型预测与已发表的数据非常吻合。我们建议 OLGA 可能是指导疫苗设计的有用工具。
源代码可在 https://github.com/zsethna/OLGA 上获得。
补充数据可在“Bioinformatics”在线获得。