• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

- 分类方法的比较研究揭示了 -mer 特征提取的数据效率。

Comparative Study of Repertoire Classification Methods Reveals Data Efficiency of -mer Feature Extraction.

机构信息

Graduate School of Engineering, The University of Tokyo, Tokyo, Japan.

Institute of Industrial Science, The University of Tokyo, Tokyo, Japan.

出版信息

Front Immunol. 2022 Jul 20;13:797640. doi: 10.3389/fimmu.2022.797640. eCollection 2022.

DOI:10.3389/fimmu.2022.797640
PMID:35936014
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9346074/
Abstract

The repertoire of T cell receptors encodes various types of immunological information. Machine learning is indispensable for decoding such information from repertoire datasets measured by next-generation sequencing (NGS). In particular, the classification of repertoires is the most basic task, which is relevant for a variety of scientific and clinical problems. Supported by the recent appearance of large datasets, efficient but data-expensive methods have been proposed. However, it is unclear whether they can work efficiently when the available sample size is severely restricted as in practical situations. In this study, we demonstrate that their performances can be impaired substantially below critical sample sizes. To complement this drawback, we propose MotifBoost, which exploits the information of short -mer motifs of TCRs. MotifBoost can perform the classification as efficiently as a deep learning method on large datasets while providing more stable and reliable results on small datasets. We tested MotifBoost on the four small datasets which consist of various conditions such as Cytomegalovirus (CMV), HIV, -chain, -chain and it consistently preserved the stability. We also clarify that the robustness of MotifBoost can be attributed to the efficiency of -mer motifs as representation features of repertoires. Finally, by comparing the predictions of these methods, we show that the whole sequence identity and sequence motifs encode partially different information and that a combination of such complementary information is necessary for further development of repertoire analysis.

摘要

T 细胞受体的 repertoire 编码了各种类型的免疫学信息。机器学习对于从下一代测序 (NGS) 测量的 repertoire 数据集中解码这些信息是不可或缺的。特别是,repertoire 的分类是最基本的任务,与各种科学和临床问题都相关。在最近出现的大型数据集的支持下,已经提出了高效但数据密集型的方法。然而,当可用样本量严重受限(如实际情况)时,它们是否能有效地工作尚不清楚。在本研究中,我们证明它们的性能在关键样本量以下会受到严重影响。为了弥补这一缺陷,我们提出了 MotifBoost,它利用了 TCR 短 -mer 基序的信息。MotifBoost 可以在大型数据集上像深度学习方法一样高效地进行分类,同时在小型数据集上提供更稳定和可靠的结果。我们在由各种条件(如巨细胞病毒 (CMV)、HIV、-链、-链)组成的四个小型数据集上测试了 MotifBoost,它始终保持稳定性。我们还澄清了 MotifBoost 的稳健性可归因于 -mer 基序作为 repertoire 表示特征的效率。最后,通过比较这些方法的预测,我们表明全长序列同一性和序列基序编码了部分不同的信息,并且这种互补信息的组合对于 repertoire 分析的进一步发展是必要的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/21b2d2b81fb8/fimmu-13-797640-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/02be8a3757a6/fimmu-13-797640-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/31439a513122/fimmu-13-797640-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/9cdd7473d262/fimmu-13-797640-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/c4b769a8aded/fimmu-13-797640-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/21b2d2b81fb8/fimmu-13-797640-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/02be8a3757a6/fimmu-13-797640-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/31439a513122/fimmu-13-797640-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/9cdd7473d262/fimmu-13-797640-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/c4b769a8aded/fimmu-13-797640-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c98/9346074/21b2d2b81fb8/fimmu-13-797640-g005.jpg

相似文献

1
Comparative Study of Repertoire Classification Methods Reveals Data Efficiency of -mer Feature Extraction.- 分类方法的比较研究揭示了 -mer 特征提取的数据效率。
Front Immunol. 2022 Jul 20;13:797640. doi: 10.3389/fimmu.2022.797640. eCollection 2022.
2
Cytomegalovirus-Mediated T Cell Receptor Repertoire Perturbation Is Present in Early Life.巨细胞病毒介导的 T 细胞受体库紊乱存在于生命早期。
Front Immunol. 2020 Sep 30;11:1587. doi: 10.3389/fimmu.2020.01587. eCollection 2020.
3
ImmunoDataAnalyzer: a bioinformatics pipeline for processing barcoded and UMI tagged immunological NGS data.免疫数据分析器:一个用于处理带有条形码和 UMI 标记的免疫组学 NGS 数据的生物信息学管道。
BMC Bioinformatics. 2022 Jan 6;23(1):21. doi: 10.1186/s12859-021-04535-4.
4
Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence.使用 CDR3 蛋白序列短片段免疫接种复杂抗原诱导 CD4 T 细胞受体库中全球变化的跟踪。
Bioinformatics. 2014 Nov 15;30(22):3181-8. doi: 10.1093/bioinformatics/btu523. Epub 2014 Aug 5.
5
Detection of Enriched T Cell Epitope Specificity in Full T Cell Receptor Sequence Repertoires.检测全 T 细胞受体序列库中富集的 T 细胞表位特异性。
Front Immunol. 2019 Nov 29;10:2820. doi: 10.3389/fimmu.2019.02820. eCollection 2019.
6
Abundant cytomegalovirus (CMV) reactive clonotypes in the CD8(+) T cell receptor alpha repertoire following allogeneic transplantation.同种异体移植后CD8(+)T细胞受体α库中存在大量巨细胞病毒(CMV)反应性克隆型。
Clin Exp Immunol. 2016 Jun;184(3):389-402. doi: 10.1111/cei.12770. Epub 2016 Mar 8.
7
T-cell receptor repertoire of cytomegalovirus-specific cytotoxic T-cells after allogeneic stem cell transplantation.异基因造血干细胞移植后细胞巨化病毒特异性细胞毒性 T 细胞的 T 细胞受体库。
Sci Rep. 2020 Dec 17;10(1):22218. doi: 10.1038/s41598-020-79363-2.
8
Major TCR Repertoire Perturbation by Immunodominant HLA-B44:03-Restricted CMV-Specific T Cells.主要 TCR 谱受免疫显性 HLA-B44:03 限制的 CMV 特异性 T 细胞的改变。
Front Immunol. 2018 Nov 14;9:2539. doi: 10.3389/fimmu.2018.02539. eCollection 2018.
9
A new high-throughput sequencing method for determining diversity and similarity of T cell receptor (TCR) α and β repertoires and identifying potential new invariant TCR α chains.一种用于确定T细胞受体(TCR)α和β谱系的多样性和相似性并鉴定潜在新恒定TCRα链的新型高通量测序方法。
BMC Immunol. 2016 Oct 11;17(1):38. doi: 10.1186/s12865-016-0177-5.
10
3D: diversity, dynamics, differential testing - a proposed pipeline for analysis of next-generation sequencing T cell repertoire data.3D:多样性、动态性、差异测试——一种用于分析下一代测序T细胞受体库数据的提议流程
BMC Bioinformatics. 2017 Feb 27;18(1):129. doi: 10.1186/s12859-017-1544-9.

引用本文的文献

1
Defining the genetic determinants of CD8 T cell receptor repertoire in the context of immune checkpoint blockade.在免疫检查点阻断的背景下定义CD8 T细胞受体库的遗传决定因素。
Sci Adv. 2025 Jul 25;11(30):eadu3461. doi: 10.1126/sciadv.adu3461.
2
G2VTCR: predicting antigen binding specificity by Weisfeiler-Lehman graph embedding of T cell receptor sequences.G2VTCR:通过T细胞受体序列的魏斯费勒-莱曼图嵌入预测抗原结合特异性
bioRxiv. 2025 May 4:2025.04.29.651344. doi: 10.1101/2025.04.29.651344.
3
Simulation of adaptive immune receptors and repertoires with complex immune information to guide the development and benchmarking of AIRR machine learning.

本文引用的文献

1
The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires.用于适应性免疫受体库机器学习分析的immuneML生态系统。
Nat Mach Intell. 2021 Nov;3(11):936-944. doi: 10.1038/s42256-021-00413-z. Epub 2021 Nov 16.
2
Immune2vec: Embedding B/T Cell Receptor Sequences in ℝ Using Natural Language Processing.免疫 2 向量:使用自然语言处理将 B/T 细胞受体序列嵌入 ℝ 中。
Front Immunol. 2021 Jul 22;12:680687. doi: 10.3389/fimmu.2021.680687. eCollection 2021.
3
Deep generative selection models of T and B cell receptor repertoires with soNNia.
利用复杂免疫信息模拟适应性免疫受体和库,以指导适应性免疫受体库(AIRR)机器学习的开发和基准测试。
Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkaf025.
4
TCRosetta: An Integrated Analysis and Annotation Platform for T-cell Receptor Sequences.TCRosetta:T 细胞受体序列的综合分析和注释平台。
Genomics Proteomics Bioinformatics. 2024 Oct 15;22(4). doi: 10.1093/gpbjnl/qzae013.
5
BertTCR: a Bert-based deep learning framework for predicting cancer-related immune status based on T cell receptor repertoire.BertTCR:一种基于 Bert 的深度学习框架,用于基于 T 细胞受体库预测癌症相关的免疫状态。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae420.
6
simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods.simAIRR:具有真实受体序列共享的适应性免疫受体模拟,用于免疫状态预测方法的基准测试。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad074. Epub 2023 Oct 17.
7
GENTLE: a novel bioinformatics tool for generating features and building classifiers from T cell repertoire cancer data.GENTLE:一种新的生物信息学工具,可从 T 细胞受体库癌症数据中生成特征并构建分类器。
BMC Bioinformatics. 2023 Jan 30;24(1):32. doi: 10.1186/s12859-023-05155-w.
8
Machine Learning Approaches to TCR Repertoire Analysis.机器学习方法在 TCR repertoire 分析中的应用。
Front Immunol. 2022 Jul 15;13:858057. doi: 10.3389/fimmu.2022.858057. eCollection 2022.
使用 soNNia 对 T 细胞和 B 细胞受体库进行深度生成选择模型
Proc Natl Acad Sci U S A. 2021 Apr 6;118(14). doi: 10.1073/pnas.2023141118.
4
TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function.TCRdb:一个带有强大搜索功能的 T 细胞受体序列综合数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D468-D474. doi: 10.1093/nar/gkaa796.
5
Analytical evaluation of the clonoSEQ Assay for establishing measurable (minimal) residual disease in acute lymphoblastic leukemia, chronic lymphocytic leukemia, and multiple myeloma.对 clonoSEQ 检测方法在急性淋巴细胞白血病、慢性淋巴细胞白血病和多发性骨髓瘤中建立可测量(最小)残留病的分析评估。
BMC Cancer. 2020 Jun 30;20(1):612. doi: 10.1186/s12885-020-07077-9.
6
Clonal expansion of innate and adaptive lymphocytes.先天和适应性淋巴细胞的克隆扩增。
Nat Rev Immunol. 2020 Nov;20(11):694-707. doi: 10.1038/s41577-020-0307-4. Epub 2020 May 18.
7
Boosting Tree-Assisted Multitask Deep Learning for Small Scientific Datasets.基于提升树的多任务深度学习在小科学数据集上的应用。
J Chem Inf Model. 2020 Mar 23;60(3):1235-1244. doi: 10.1021/acs.jcim.9b01184. Epub 2020 Feb 3.
8
Deep generative models for T cell receptor protein sequences.深度生成模型在 T 细胞受体蛋白序列中的应用。
Elife. 2019 Sep 5;8:e46935. doi: 10.7554/eLife.46935.
9
Analysis of the TCR Repertoire in HIV-Exposed but Uninfected Infants.分析 HIV 暴露但未感染婴儿的 TCR 库。
Sci Rep. 2019 Aug 16;9(1):11954. doi: 10.1038/s41598-019-48434-4.
10
XGBoost Model for Chronic Kidney Disease Diagnosis.XGBoost 模型用于慢性肾脏病诊断。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2131-2140. doi: 10.1109/TCBB.2019.2911071. Epub 2020 Dec 8.