Suppr超能文献

用变压器确定 T 细胞受体的表位特异性。

Determining epitope specificity of T-cell receptors with transformers.

机构信息

Department of Intelligent Systems, Delft University of Technology, Delft 2600 GA, The Netherlands.

Leiden Computational Biology Center, Department of Molecular Epidemiology, Leiden University Medical Center, Leiden 2333 ZA, The Netherlands.

出版信息

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad632.

Abstract

SUMMARY

T-cell receptors (TCRs) on T cells recognize and bind to epitopes presented by the major histocompatibility complex in case of an infection or cancer. However, the high diversity of TCRs, as well as their unique and complex binding mechanisms underlying epitope recognition, make it difficult to predict the binding between TCRs and epitopes. Here, we present the utility of transformers, a deep learning strategy that incorporates an attention mechanism that learns the informative features, and show that these models pre-trained on a large set of protein sequences outperform current strategies. We compared three pre-trained auto-encoder transformer models (ProtBERT, ProtAlbert, and ProtElectra) and one pre-trained auto-regressive transformer model (ProtXLNet) to predict the binding specificity of TCRs to 25 epitopes from the VDJdb database (human and murine). Two additional modifications were performed to incorporate gene usage of the TCRs in the four transformer models. Of all 12 transformer implementations (four models with three different modifications), a modified version of the ProtXLNet model could predict TCR-epitope pairs with the highest accuracy (weighted F1 score 0.55 simultaneously considering all 25 epitopes). The modification included additional features representing the gene names for the TCRs. We also showed that the basic implementation of transformers outperformed the previously available methods, i.e. TCRGP, TCRdist, and DeepTCR, developed for the same biological problem, especially for the hard-to-classify labels. We show that the proficiency of transformers in attention learning can be made operational in a complex biological setting like TCR binding prediction. Further ingenuity in utilizing the full potential of transformers, either through attention head visualization or introducing additional features, can extend T-cell research avenues.

AVAILABILITY AND IMPLEMENTATION

Data and code are available on https://github.com/InduKhatri/tcrformer.

摘要

摘要

T 细胞上的 T 细胞受体 (TCR) 在感染或癌症时识别和结合主要组织相容性复合体呈现的表位。然而,TCR 的高度多样性,以及它们独特而复杂的识别表位的结合机制,使得预测 TCR 与表位之间的结合变得困难。在这里,我们展示了转换器的实用性,这是一种深度学习策略,它包含了一种注意力机制,可以学习有信息的特征,并表明这些在一大组蛋白质序列上进行预训练的模型优于当前的策略。我们比较了三种预先训练的自动编码器转换器模型(ProtBERT、ProtAlbert 和 ProtElectra)和一种预先训练的自动回归转换器模型(ProtXLNet),以预测来自 VDJdb 数据库(人类和鼠类)的 25 个表位的 TCR 结合特异性。在这四个转换器模型中,对基因使用情况进行了两次额外的修改。在所有 12 种转换器实现方式(四个模型各有三种不同的修改)中,经过修改的 ProtXLNet 模型可以预测 TCR-表位对的准确性最高(同时考虑所有 25 个表位的加权 F1 分数为 0.55)。该修改包括了表示 TCR 基因名称的额外特征。我们还表明,基础转换器的实现优于之前为解决同一生物学问题而开发的 TCRGP、TCRdist 和 DeepTCR 等可用方法,尤其是对于难以分类的标签。我们表明,转换器在注意力学习方面的熟练程度可以在 TCR 结合预测等复杂的生物学环境中得以实现。通过注意力头可视化或引入其他特征,进一步发挥转换器的全部潜力,可以拓展 T 细胞研究途径。

可用性和实现

数据和代码可在 https://github.com/InduKhatri/tcrformer 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0674/10636277/29179db05e11/btad632f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验