Suppr超能文献

BCR CDR3 长度分布在血液和脾脏、老年和年轻患者之间存在差异,而 TCR 分布可用于检测骨髓增生异常综合征。

BCR CDR3 length distributions differ between blood and spleen and between old and young patients, and TCR distributions can be used to detect myelodysplastic syndrome.

机构信息

The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel.

出版信息

Phys Biol. 2013 Oct;10(5):056001. doi: 10.1088/1478-3975/10/5/056001. Epub 2013 Aug 22.

Abstract

Complementarity-determining region 3 (CDR3) is the most hyper-variable region in B cell receptor (BCR) and T cell receptor (TCR) genes, and the most critical structure in antigen recognition and thereby in determining the fates of developing and responding lymphocytes. There are millions of different TCR Vβ chain or BCR heavy chain CDR3 sequences in human blood. Even now, when high-throughput sequencing becomes widely used, CDR3 length distributions (also called spectratypes) are still a much quicker and cheaper method of assessing repertoire diversity. However, distribution complexity and the large amount of information per sample (e.g. 32 distributions of the TCRα chain, and 24 of TCRβ) calls for the use of machine learning tools for full exploration. We have examined the ability of supervised machine learning, which uses computational models to find hidden patterns in predefined biological groups, to analyze CDR3 length distributions from various sources, and distinguish between experimental groups. We found that (a) splenic BCR CDR3 length distributions are characterized by low standard deviations and few local maxima, compared to peripheral blood distributions; (b) healthy elderly people's BCR CDR3 length distributions can be distinguished from those of the young; and (c) a machine learning model based on TCR CDR3 distribution features can detect myelodysplastic syndrome with approximately 93% accuracy. Overall, we demonstrate that using supervised machine learning methods can contribute to our understanding of lymphocyte repertoire diversity.

摘要

互补决定区 3(CDR3)是 B 细胞受体(BCR)和 T 细胞受体(TCR)基因中最可变的区域,也是抗原识别中最关键的结构,从而决定了发育中和应答性淋巴细胞的命运。人类血液中有数百万种不同的 TCR Vβ 链或 BCR 重链 CDR3 序列。即使在现在,当高通量测序变得广泛应用时,CDR3 长度分布(也称为谱型)仍然是评估库多样性的更快、更便宜的方法。然而,分布的复杂性和每个样本的大量信息(例如,TCRα链的 32 个分布和 TCRβ的 24 个分布)需要使用机器学习工具进行全面探索。我们已经检验了监督机器学习的能力,这种机器学习使用计算模型在预定义的生物组中寻找隐藏模式,来分析来自不同来源的 CDR3 长度分布,并区分实验组。我们发现:(a)与外周血分布相比,脾脏 BCR CDR3 长度分布的标准差较低,局部最大值较少;(b)健康老年人的 BCR CDR3 长度分布可以与年轻人区分开来;(c)基于 TCR CDR3 分布特征的机器学习模型可以以约 93%的准确率检测骨髓增生异常综合征。总的来说,我们证明了使用监督机器学习方法可以有助于我们理解淋巴细胞库多样性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验