基于 TCR 序列的自监督学习揭示了 T 细胞身份的核心特征。

Self-supervised learning of T cell receptor sequences exposes core properties for T cell membership.

机构信息

The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel.

出版信息

Sci Adv. 2024 Apr 26;10(17):eadk4670. doi: 10.1126/sciadv.adk4670.

DOI:10.1126/sciadv.adk4670

PMID:38669334

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11809652/

Abstract

The T cell receptor (TCR) repertoire is an extraordinarily diverse collection of TCRs essential for maintaining the body's homeostasis and response to threats. In this study, we compiled an extensive dataset of more than 4200 bulk TCR repertoire samples, encompassing 221,176,713 sequences, alongside 6,159,652 single-cell TCR sequences from over 400 samples. From this dataset, we then selected a representative subset of 5 million bulk sequences and 4.2 million single-cell sequences to train two specialized Transformer-based language models for bulk (CVC) and single-cell (scCVC) TCR repertoires, respectively. We show that these models successfully capture TCR core qualities, such as sharing, gene composition, and single-cell properties. These qualities are emergent in the encoded TCR latent space and enable classification into TCR-based qualities such as public sequences. These models demonstrate the potential of Transformer-based language models in TCR downstream applications.

摘要

T 细胞受体 (TCR) 库是一组极其多样化的 TCR 集合，对于维持身体的内稳态和对威胁的反应至关重要。在这项研究中，我们编制了一个包含超过 4200 个批量 TCR 库样本的广泛数据集，其中包含 221176713 个序列，以及来自 400 多个样本的 6159652 个单细胞 TCR 序列。从这个数据集中，我们选择了一个具有代表性的 500 万个批量序列和 420 万个单细胞序列的子集，分别用于训练两个专门的基于 Transformer 的批量 (CVC) 和单细胞 (scCVC) TCR 库语言模型。我们表明，这些模型成功地捕获了 TCR 的核心品质，例如共享、基因组成和单细胞特性。这些品质在编码的 TCR 潜在空间中是涌现的，并能够将 TCR 分为基于公共序列等品质的类别。这些模型展示了基于 Transformer 的语言模型在 TCR 下游应用中的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6b3/11809652/d5b7686c0c7d/sciadv.adk4670-f1.jpg

相似文献

Self-supervised learning of T cell receptor sequences exposes core properties for T cell membership.基于 TCR 序列的自监督学习揭示了 T 细胞身份的核心特征。

Sci Adv. 2024 Apr 26;10(17):eadk4670. doi: 10.1126/sciadv.adk4670.

Association of antibody and T cell receptor repertoires in Trypanosoma cruzi infected rhesus macaques and host response to infection.克氏锥虫感染的恒河猴中抗体和T细胞受体库的关联以及宿主对感染的反应

J Biomed Sci. 2025 Jun 18;32(1):58. doi: 10.1186/s12929-025-01152-8.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果：面向临床医生的网状Meta分析教程

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

Cauliflower leaf diseases: A computer vision dataset for smart agriculture.花椰菜叶部病害：一个用于智慧农业的计算机视觉数据集。

Data Brief. 2025 Apr 28;60:111594. doi: 10.1016/j.dib.2025.111594. eCollection 2025 Jun.

Molecular feature-based classification of retroperitoneal liposarcoma: a prospective cohort study.基于分子特征的腹膜后脂肪肉瘤分类：一项前瞻性队列研究。

Elife. 2025 May 23;14:RP100887. doi: 10.7554/eLife.100887.

Prediction, screening and characterization of novel bioactive tetrapeptide matrikines for skin rejuvenation.预测、筛选和鉴定具有皮肤年轻化功效的新型生物活性四肽基质。

Br J Dermatol. 2024 Jun 20;191(1):92-106. doi: 10.1093/bjd/ljae061.

Aural toilet (ear cleaning) for chronic suppurative otitis media.慢性化脓性中耳炎的耳道清理（耳部清洁）

Cochrane Database Syst Rev. 2025 Jun 9;6(6):CD013057. doi: 10.1002/14651858.CD013057.pub3.

SAP-mediated inhibition of diacylglycerol kinase α regulates TCR-induced diacylglycerol signaling.SAP 通过抑制二酰基甘油激酶 α 调节 TCR 诱导的二酰基甘油信号转导。

J Immunol. 2011 Dec 1;187(11):5941-51. doi: 10.4049/jimmunol.1002476. Epub 2011 Nov 2.

AI-Driven Antimicrobial Peptide Discovery: Mining and Generation.人工智能驱动的抗菌肽发现：挖掘与生成

Acc Chem Res. 2025 Jun 17;58(12):1831-1846. doi: 10.1021/acs.accounts.0c00594. Epub 2025 Jun 3.

Stakeholders' perceptions and experiences of factors influencing the commissioning, delivery, and uptake of general health checks: a qualitative evidence synthesis.利益相关者对影响一般健康检查的委托、提供和接受因素的看法与体验：一项定性证据综合分析

Cochrane Database Syst Rev. 2025 Mar 20;3(3):CD014796. doi: 10.1002/14651858.CD014796.pub2.

引用本文的文献

Breast cancer is detectable from peripheral blood using machine learning over T cell receptor repertoires.利用机器学习分析T细胞受体库，可从外周血中检测出乳腺癌。

NPJ Syst Biol Appl. 2025 Aug 8;11(1):89. doi: 10.1038/s41540-025-00573-3.

Enhancing sequence alignment of adaptive immune receptors through multi-task deep learning.通过多任务深度学习增强适应性免疫受体的序列比对

Nucleic Acids Res. 2025 Jul 8;53(13). doi: 10.1093/nar/gkaf651.

MIST: An interpretable and flexible deep learning framework for single-T cell transcriptome and receptor analysis.MIST：用于单细胞转录组和受体分析的可解释且灵活的深度学习框架。

Sci Adv. 2025 Apr 4;11(14):eadr7134. doi: 10.1126/sciadv.adr7134.

Leveraging machine learning for integrative analysis of T-cell receptor repertoires in colorectal cancer: Insights into MAIT cell dynamics and risk assessment.利用机器学习对结直肠癌中的T细胞受体库进行综合分析：对黏膜相关恒定T细胞动态变化及风险评估的见解

Transl Oncol. 2025 May;55:102358. doi: 10.1016/j.tranon.2025.102358. Epub 2025 Mar 14.

本文引用的文献

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Protocol for the isolation of CD8+ tumor-infiltrating lymphocytes from human tumors and their characterization by single-cell immune profiling and multiome.从人类肿瘤中分离 CD8+ 肿瘤浸润淋巴细胞及其通过单细胞免疫分析和多组学特征分析的方案。

STAR Protoc. 2022 Aug 26;3(3):101649. doi: 10.1016/j.xpro.2022.101649. eCollection 2022 Sep 16.

Midkine expression by stem-like tumor cells drives persistence to mTOR inhibition and an immune-suppressive microenvironment.肿瘤干细胞样细胞表达 midkine 促进了对 mTOR 抑制的持续作用和免疫抑制微环境。

Nat Commun. 2022 Aug 26;13(1):5018. doi: 10.1038/s41467-022-32673-7.

Single-cell atlas of diverse immune populations in the advanced biliary tract cancer microenvironment.晚期胆管癌微环境中多种免疫细胞群的单细胞图谱

NPJ Precis Oncol. 2022 Aug 18;6(1):58. doi: 10.1038/s41698-022-00300-9.

Tissue-resident memory and circulating T cells are early responders to pre-surgical cancer immunotherapy.组织驻留记忆 T 细胞和循环 T 细胞是癌症术前免疫治疗的早期应答者。

Cell. 2022 Aug 4;185(16):2918-2935.e29. doi: 10.1016/j.cell.2022.06.018. Epub 2022 Jul 7.

Dissecting the treatment-naive ecosystem of human melanoma brain metastasis.解析未经治疗的人黑色素瘤脑转移的生态系统。

Cell. 2022 Jul 7;185(14):2591-2608.e30. doi: 10.1016/j.cell.2022.06.007.

DECODE: a computational pipeline to discover T cell receptor binding rules.DECODE：一种用于发现 T 细胞受体结合规则的计算管道。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i246-i254. doi: 10.1093/bioinformatics/btac257.

A single-cell map of dynamic chromatin landscapes of immune cells in renal cell carcinoma.单细胞图谱描绘了肾癌免疫细胞中动态染色质景观。

Nat Cancer. 2022 Jul;3(7):885-898. doi: 10.1038/s43018-022-00391-0. Epub 2022 Jun 6.

Immunogenicity and therapeutic targeting of a public neoantigen derived from mutated PIK3CA.突变 PIK3CA 衍生的公共新抗原的免疫原性和治疗靶向。

Nat Med. 2022 May;28(5):946-957. doi: 10.1038/s41591-022-01786-3. Epub 2022 Apr 28.

Ovarian cancer immunogenicity is governed by a narrow subset of progenitor tissue-resident memory T cells.卵巢癌的免疫原性由祖细胞组织驻留记忆 T 细胞的一个狭窄亚群控制。

Cancer Cell. 2022 May 9;40(5):545-557.e13. doi: 10.1016/j.ccell.2022.03.008. Epub 2022 Apr 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于 TCR 序列的自监督学习揭示了 T 细胞身份的核心特征。

Self-supervised learning of T cell receptor sequences exposes core properties for T cell membership.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献