• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

免疫 2 向量:使用自然语言处理将 B/T 细胞受体序列嵌入 ℝ 中。

Immune2vec: Embedding B/T Cell Receptor Sequences in ℝ Using Natural Language Processing.

机构信息

Bioengineering, Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel.

Bar Ilan Institute of Nanotechnologies and Advanced Materials, Bar Ilan University, Ramat Gan, Israel.

出版信息

Front Immunol. 2021 Jul 22;12:680687. doi: 10.3389/fimmu.2021.680687. eCollection 2021.

DOI:10.3389/fimmu.2021.680687
PMID:34367141
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8340020/
Abstract

The adaptive branch of the immune system learns pathogenic patterns and remembers them for future encounters. It does so through dynamic and diverse repertoires of T- and B- cell receptors (TCR and BCRs, respectively). These huge immune repertoires in each individual present investigators with the challenge of extracting meaningful biological information from multi-dimensional data. The ability to embed these DNA and amino acid textual sequences in a vector-space is an important step towards developing effective analysis methods. Here we present Immune2vec, an adaptation of a natural language processing (NLP)-based embedding technique for BCR repertoire sequencing data. We validate Immune2vec on amino acid 3-gram sequences, continuing to longer BCR sequences, and finally to entire repertoires. Our work demonstrates Immune2vec to be a reliable low-dimensional representation that preserves relevant information of immune sequencing data, such as n-gram properties and IGHV gene family classification. Applying Immune2vec along with machine learning approaches to patient data exemplifies how distinct clinical conditions can be effectively stratified, indicating that the embedding space can be used for feature extraction and exploratory data analysis.

摘要

免疫系统的适应性分支学习病原模式,并为未来的遭遇记住它们。它通过 T 细胞和 B 细胞受体(分别为 TCR 和 BCR)的动态和多样化的受体库来实现这一点。在每个人中,这些巨大的免疫库都给研究人员带来了从多维数据中提取有意义的生物学信息的挑战。将这些 DNA 和氨基酸文本序列嵌入向量空间的能力是开发有效分析方法的重要步骤。在这里,我们提出了 Immune2vec,这是一种基于自然语言处理(NLP)的 BCR 库测序数据嵌入技术的改编。我们在氨基酸 3 克序列上验证了 Immune2vec,继续到更长的 BCR 序列,最后到整个库。我们的工作表明 Immune2vec 是一种可靠的低维表示,它保留了免疫测序数据的相关信息,例如 n 克特性和 IGHV 基因家族分类。将 Immune2vec 与机器学习方法一起应用于患者数据,说明了如何有效地对不同的临床情况进行分层,表明嵌入空间可用于特征提取和探索性数据分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/8fbe8dec542e/fimmu-12-680687-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/49869ae6c6e5/fimmu-12-680687-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/77aafe0bd6e6/fimmu-12-680687-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/ba64a21342be/fimmu-12-680687-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/e25596c287cd/fimmu-12-680687-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/8833365fa88b/fimmu-12-680687-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/8fbe8dec542e/fimmu-12-680687-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/49869ae6c6e5/fimmu-12-680687-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/77aafe0bd6e6/fimmu-12-680687-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/ba64a21342be/fimmu-12-680687-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/e25596c287cd/fimmu-12-680687-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/8833365fa88b/fimmu-12-680687-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0bf0/8340020/8fbe8dec542e/fimmu-12-680687-g006.jpg

相似文献

1
Immune2vec: Embedding B/T Cell Receptor Sequences in ℝ Using Natural Language Processing.免疫 2 向量:使用自然语言处理将 B/T 细胞受体序列嵌入 ℝ 中。
Front Immunol. 2021 Jul 22;12:680687. doi: 10.3389/fimmu.2021.680687. eCollection 2021.
2
ClonoCalc and ClonoPlot: immune repertoire analysis from raw files to publication figures with graphical user interface.ClonoCalc和ClonoPlot:从原始文件到发表图表的具有图形用户界面的免疫组库分析
BMC Bioinformatics. 2017 Mar 11;18(1):164. doi: 10.1186/s12859-017-1575-2.
3
bcRep: R Package for Comprehensive Analysis of B Cell Receptor Repertoire Data.bcRep:用于B细胞受体库数据综合分析的R包。
PLoS One. 2016 Aug 23;11(8):e0161569. doi: 10.1371/journal.pone.0161569. eCollection 2016.
4
nf-core/airrflow: An adaptive immune receptor repertoire analysis workflow employing the Immcantation framework.nf-core/airrflow:采用 Immcantation 框架的适应性免疫受体库分析工作流程。
PLoS Comput Biol. 2024 Jul 26;20(7):e1012265. doi: 10.1371/journal.pcbi.1012265. eCollection 2024 Jul.
5
Capturing the differences between humoral immunity in the normal and tumor environments from repertoire-seq of B-cell receptors using supervised machine learning.使用监督机器学习从 B 细胞受体的 repertoire-seq 中捕获正常和肿瘤环境中的体液免疫差异。
BMC Bioinformatics. 2019 May 28;20(1):267. doi: 10.1186/s12859-019-2853-y.
6
16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.16S rRNA 序列嵌入:核苷酸序列有意义的数值特征表示形式,方便下游分析。
PLoS Comput Biol. 2019 Feb 26;15(2):e1006721. doi: 10.1371/journal.pcbi.1006721. eCollection 2019 Feb.
7
T-cell receptor and B-cell receptor repertoire profiling in adaptive immunity.T 细胞受体和 B 细胞受体库分析在适应性免疫中的应用。
Transpl Int. 2019 Nov;32(11):1111-1123. doi: 10.1111/tri.13475. Epub 2019 Jul 29.
8
IRProfiler - a software toolbox for high throughput immune receptor profiling.IRProfiler - 高通量免疫受体分析软件工具包。
BMC Bioinformatics. 2018 Apr 18;19(1):144. doi: 10.1186/s12859-018-2144-z.
9
Language model-based B cell receptor sequence embeddings can effectively encode receptor specificity.基于语言模型的 B 细胞受体序列嵌入可以有效地编码受体特异性。
Nucleic Acids Res. 2024 Jan 25;52(2):548-557. doi: 10.1093/nar/gkad1128.
10
Computational Analysis of B-Cell Receptor (BCR) Immune Repertoires with Abalign.使用 Abalign 进行 B 细胞受体 (BCR) 免疫受体的计算分析。
Curr Protoc. 2024 Feb;4(2):e1002. doi: 10.1002/cpz1.1002.

引用本文的文献

1
Enhancing sequence alignment of adaptive immune receptors through multi-task deep learning.通过多任务深度学习增强适应性免疫受体的序列比对
Nucleic Acids Res. 2025 Jul 8;53(13). doi: 10.1093/nar/gkaf651.
2
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
3
Genome language modeling (GLM): a beginner's cheat sheet.

本文引用的文献

1
Machine Learning Analysis of Naïve B-Cell Receptor Repertoires Stratifies Celiac Disease Patients and Controls.机器学习分析原始 B 细胞受体库对乳糜泻患者和对照进行分层。
Front Immunol. 2021 Mar 10;12:627813. doi: 10.3389/fimmu.2021.627813. eCollection 2021.
2
Comprehensive mapping of immune perturbations associated with severe COVID-19.全面绘制与严重 COVID-19 相关的免疫扰动图谱。
Sci Immunol. 2020 Jul 15;5(49). doi: 10.1126/sciimmunol.abd7114.
3
Deep generative models for T cell receptor protein sequences.深度生成模型在 T 细胞受体蛋白序列中的应用。
基因组语言建模(GLM):初学者简易指南。
Biol Methods Protoc. 2025 Mar 25;10(1):bpaf022. doi: 10.1093/biomethods/bpaf022. eCollection 2025.
4
AMULETY: A Python package to embed adaptive immune receptor sequences.AMULETY:一个用于嵌入适应性免疫受体序列的Python软件包。
bioRxiv. 2025 Mar 25:2025.03.21.644583. doi: 10.1101/2025.03.21.644583.
5
Robust detection of infectious disease, autoimmunity, and cancer from the paratope networks of adaptive immune receptors.从适应性免疫受体的互补决定区网络中稳健地检测传染病、自身免疫和癌症。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae431.
6
Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery.对 40 亿个人类抗体可变区的大规模数据挖掘揭示了治疗性抗体和天然抗体之间的趋同,这限制了生物药物发现的搜索空间。
MAbs. 2024 Jan-Dec;16(1):2361928. doi: 10.1080/19420862.2024.2361928. Epub 2024 Jun 6.
7
Self-supervised learning of T cell receptor sequences exposes core properties for T cell membership.基于 TCR 序列的自监督学习揭示了 T 细胞身份的核心特征。
Sci Adv. 2024 Apr 26;10(17):eadk4670. doi: 10.1126/sciadv.adk4670.
8
Language model-based B cell receptor sequence embeddings can effectively encode receptor specificity.基于语言模型的 B 细胞受体序列嵌入可以有效地编码受体特异性。
Nucleic Acids Res. 2024 Jan 25;52(2):548-557. doi: 10.1093/nar/gkad1128.
9
A novel approach to T-cell receptor beta chain (TCRB) repertoire encoding using lossless string compression.利用无损字符串压缩对 T 细胞受体β链(TCRB)进行编码的新方法。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad426.
10
Altered somatic hypermutation patterns in COVID-19 patients classifies disease severity.新冠病毒感染者的体细胞超突变模式改变可对疾病严重程度进行分类。
Front Immunol. 2023 Apr 19;14:1031914. doi: 10.3389/fimmu.2023.1031914. eCollection 2023.
Elife. 2019 Sep 5;8:e46935. doi: 10.7554/eLife.46935.
4
Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping.贝叶斯单体型分析揭示的人抗体重链基因座的镶嵌缺失模式。
Nat Commun. 2019 Feb 7;10(1):628. doi: 10.1038/s41467-019-08489-3.
5
Gene2vec: distributed representation of genes based on co-expression.Gene2vec:基于共表达的基因分布式表示。
BMC Genomics. 2019 Feb 4;20(Suppl 1):82. doi: 10.1186/s12864-018-5370-x.
6
Commonality despite exceptional diversity in the baseline human antibody repertoire.在基础人类抗体库中存在着共同性,尽管存在着异常的多样性。
Nature. 2019 Feb;566(7744):393-397. doi: 10.1038/s41586-019-0879-y. Epub 2019 Jan 21.
7
Antibody Repertoire Analysis of Hepatitis C Virus Infections Identifies Immune Signatures Associated With Spontaneous Clearance.丙型肝炎病毒感染的抗体组分析确定了与自发性清除相关的免疫特征。
Front Immunol. 2018 Dec 21;9:3004. doi: 10.3389/fimmu.2018.03004. eCollection 2018.
8
Identifying antimicrobial peptides using word embedding with deep recurrent neural networks.使用深度递归神经网络的词嵌入来识别抗菌肽。
Bioinformatics. 2019 Jun 1;35(12):2009-2016. doi: 10.1093/bioinformatics/bty937.
9
Evaluating semantic relations in neural word embeddings with biomedical and general domain knowledge bases.利用生物医学和一般领域知识库评估神经词汇嵌入中的语义关系。
BMC Med Inform Decis Mak. 2018 Jul 23;18(Suppl 2):65. doi: 10.1186/s12911-018-0630-x.
10
Opportunities and obstacles for deep learning in biology and medicine.深度学习在生物学和医学中的机遇与挑战。
J R Soc Interface. 2018 Apr;15(141). doi: 10.1098/rsif.2017.0387.