Bioengineering, Faculty of Engineering, Bar Ilan University, Ramat Gan, Israel.
Bar Ilan Institute of Nanotechnologies and Advanced Materials, Bar Ilan University, Ramat Gan, Israel.
Front Immunol. 2021 Mar 10;12:627813. doi: 10.3389/fimmu.2021.627813. eCollection 2021.
Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ variants to CD4+ T cells. In addition to gluten-specific CD4+ T cells the patients have antibodies to transglutaminase 2 (autoantigen) and deamidated gluten peptides. These disease-specific antibodies recognize defined epitopes and they display common usage of specific heavy and light chains across patients. Interactions between T cells and B cells are likely central in the pathogenesis, but how the repertoires of naïve T and B cells relate to the pathogenic effector cells is unexplored. To this end, we applied machine learning classification models to naïve B cell receptor (BCR) repertoires from CeD patients and healthy controls. Strikingly, we obtained a promising classification performance with an F1 score of 85%. Clusters of heavy and light chain sequences were inferred and used as features for the model, and signatures associated with the disease were then characterized. These signatures included amino acid (AA) 3-mers with distinct bio-physiochemical characteristics and enriched V and J genes. We found that CeD-associated clusters can be identified and that common motifs can be characterized from naïve BCR repertoires. The results may indicate a genetic influence by BCR encoding genes in CeD. Analysis of naïve BCRs as presented here may become an important part of assessing the risk of individuals to develop CeD. Our model demonstrates the potential of using BCR repertoires and in particular, naïve BCR repertoires, as disease susceptibility markers.
乳糜泻(CeD)是一种常见的自身免疫性疾病,由对膳食谷蛋白的异常免疫反应引起。该疾病具有高遗传性。HLA 是主要的易感性因素,HLA 效应通过疾病相关的 HLA-DQ 变体对 CD4+T 细胞呈递去酰胺谷蛋白肽来介导。除了谷蛋白特异性 CD4+T 细胞外,患者还具有针对转谷氨酰胺酶 2(自身抗原)和去酰胺谷蛋白肽的抗体。这些疾病特异性抗体识别特定的表位,并且在患者之间显示特定重链和轻链的共同使用。T 细胞和 B 细胞之间的相互作用可能在发病机制中起核心作用,但未探索幼稚 T 和 B 细胞的 repertoire 与致病性效应细胞之间的关系。为此,我们应用机器学习分类模型来分析乳糜泻患者和健康对照者的幼稚 B 细胞受体(BCR)repertoire。令人惊讶的是,我们获得了有希望的分类性能,F1 评分为 85%。推断重链和轻链序列的簇,并将其用作模型的特征,然后对与疾病相关的特征进行表征。这些特征包括具有独特生物物理化学特征的氨基酸(AA)3-mers 和丰富的 V 和 J 基因。我们发现可以识别与乳糜泻相关的簇,并且可以从幼稚 BCR repertoire 中表征常见的基序。这些结果可能表明 BCR 编码基因在乳糜泻中具有遗传影响。这里呈现的幼稚 BCR 分析可能成为评估个体患乳糜泻风险的重要部分。我们的模型表明使用 BCR repertoire,特别是幼稚 BCR repertoire,作为疾病易感性标志物的潜力。