用变压器确定 T 细胞受体的表位特异性。

Determining epitope specificity of T-cell receptors with transformers.

机构信息

Department of Intelligent Systems, Delft University of Technology, Delft 2600 GA, The Netherlands.

Leiden Computational Biology Center, Department of Molecular Epidemiology, Leiden University Medical Center, Leiden 2333 ZA, The Netherlands.

出版信息

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad632.

DOI:10.1093/bioinformatics/btad632

PMID:37847663

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10636277/

Abstract

SUMMARY

T-cell receptors (TCRs) on T cells recognize and bind to epitopes presented by the major histocompatibility complex in case of an infection or cancer. However, the high diversity of TCRs, as well as their unique and complex binding mechanisms underlying epitope recognition, make it difficult to predict the binding between TCRs and epitopes. Here, we present the utility of transformers, a deep learning strategy that incorporates an attention mechanism that learns the informative features, and show that these models pre-trained on a large set of protein sequences outperform current strategies. We compared three pre-trained auto-encoder transformer models (ProtBERT, ProtAlbert, and ProtElectra) and one pre-trained auto-regressive transformer model (ProtXLNet) to predict the binding specificity of TCRs to 25 epitopes from the VDJdb database (human and murine). Two additional modifications were performed to incorporate gene usage of the TCRs in the four transformer models. Of all 12 transformer implementations (four models with three different modifications), a modified version of the ProtXLNet model could predict TCR-epitope pairs with the highest accuracy (weighted F1 score 0.55 simultaneously considering all 25 epitopes). The modification included additional features representing the gene names for the TCRs. We also showed that the basic implementation of transformers outperformed the previously available methods, i.e. TCRGP, TCRdist, and DeepTCR, developed for the same biological problem, especially for the hard-to-classify labels. We show that the proficiency of transformers in attention learning can be made operational in a complex biological setting like TCR binding prediction. Further ingenuity in utilizing the full potential of transformers, either through attention head visualization or introducing additional features, can extend T-cell research avenues.

AVAILABILITY AND IMPLEMENTATION

Data and code are available on https://github.com/InduKhatri/tcrformer.

摘要

T 细胞上的 T 细胞受体 (TCR) 在感染或癌症时识别和结合主要组织相容性复合体呈现的表位。然而，TCR 的高度多样性，以及它们独特而复杂的识别表位的结合机制，使得预测 TCR 与表位之间的结合变得困难。在这里，我们展示了转换器的实用性，这是一种深度学习策略，它包含了一种注意力机制，可以学习有信息的特征，并表明这些在一大组蛋白质序列上进行预训练的模型优于当前的策略。我们比较了三种预先训练的自动编码器转换器模型（ProtBERT、ProtAlbert 和 ProtElectra）和一种预先训练的自动回归转换器模型（ProtXLNet），以预测来自 VDJdb 数据库（人类和鼠类）的 25 个表位的 TCR 结合特异性。在这四个转换器模型中，对基因使用情况进行了两次额外的修改。在所有 12 种转换器实现方式（四个模型各有三种不同的修改）中，经过修改的 ProtXLNet 模型可以预测 TCR-表位对的准确性最高（同时考虑所有 25 个表位的加权 F1 分数为 0.55）。该修改包括了表示 TCR 基因名称的额外特征。我们还表明，基础转换器的实现优于之前为解决同一生物学问题而开发的 TCRGP、TCRdist 和 DeepTCR 等可用方法，尤其是对于难以分类的标签。我们表明，转换器在注意力学习方面的熟练程度可以在 TCR 结合预测等复杂的生物学环境中得以实现。通过注意力头可视化或引入其他特征，进一步发挥转换器的全部潜力，可以拓展 T 细胞研究途径。

可用性和实现

数据和代码可在 https://github.com/InduKhatri/tcrformer 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0674/10636277/29179db05e11/btad632f1.jpg

相似文献

Determining epitope specificity of T-cell receptors with transformers.

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad632.

Predicting TCR sequences for unseen antigen epitopes using structural and sequence features.

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae210.

EPIC-TRACE: predicting TCR binding to unseen epitopes using attention and contextualized embeddings.

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad743.

Predicting recognition between T cell receptors and epitopes with TCRGP.

PLoS Comput Biol. 2021 Mar 25;17(3):e1008814. doi: 10.1371/journal.pcbi.1008814. eCollection 2021 Mar.

TITAN: T-cell receptor specificity prediction with bimodal attention networks.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i237-i244. doi: 10.1093/bioinformatics/btab294.

Prediction of Specific TCR-Peptide Binding From Large Dictionaries of TCR-Peptide Pairs.

Front Immunol. 2020 Aug 25;11:1803. doi: 10.3389/fimmu.2020.01803. eCollection 2020.

The study of high-affinity TCRs reveals duality in T cell recognition of antigen: specificity and degeneracy.

J Immunol. 2006 Nov 15;177(10):6911-9. doi: 10.4049/jimmunol.177.10.6911.

Quantifiable predictive features define epitope-specific T cell receptor repertoires.

Nature. 2017 Jul 6;547(7661):89-93. doi: 10.1038/nature22383. Epub 2017 Jun 21.

An Attention Based Bidirectional LSTM Method to Predict the Binding of TCR and Epitope.

IEEE/ACM Trans Comput Biol Bioinform. 2022 Nov-Dec;19(6):3272-3280. doi: 10.1109/TCBB.2021.3115353. Epub 2022 Dec 8.

TCRconv: predicting recognition between T cell receptors and epitopes using contextualized motifs.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac788.

本文引用的文献

NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data.

Commun Biol. 2021 Sep 10;4(1):1060. doi: 10.1038/s42003-021-02610-3.

TITAN: T-cell receptor specificity prediction with bimodal attention networks.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i237-i244. doi: 10.1093/bioinformatics/btab294.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

Predicting recognition between T cell receptors and epitopes with TCRGP.

PLoS Comput Biol. 2021 Mar 25;17(3):e1008814. doi: 10.1371/journal.pcbi.1008814. eCollection 2021 Mar.

DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires.

Nat Commun. 2021 Mar 11;12(1):1605. doi: 10.1038/s41467-021-21879-w.

SETE: Sequence-based Ensemble learning approach for TCR Epitope binding prediction.

Comput Biol Chem. 2020 Jun 20;87:107281. doi: 10.1016/j.compbiolchem.2020.107281.

Detection of Enriched T Cell Epitope Specificity in Full T Cell Receptor Sequence Repertoires.

Front Immunol. 2019 Nov 29;10:2820. doi: 10.3389/fimmu.2019.02820. eCollection 2019.

Detecting T cell receptors involved in immune responses from single repertoire snapshots.

PLoS Biol. 2019 Jun 13;17(6):e3000314. doi: 10.1371/journal.pbio.3000314. eCollection 2019 Jun.

Focal Loss for Dense Object Detection.

IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):318-327. doi: 10.1109/TPAMI.2018.2858826. Epub 2018 Jul 23.

VDJdb: a curated database of T-cell receptor sequences with known antigen specificity.

Nucleic Acids Res. 2018 Jan 4;46(D1):D419-D427. doi: 10.1093/nar/gkx760.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用变压器确定 T 细胞受体的表位特异性。

Determining epitope specificity of T-cell receptors with transformers.

机构信息

出版信息

SUMMARY

AVAILABILITY AND IMPLEMENTATION

摘要

可用性和实现

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献