Laboratoire de physique de l'École normale supérieure, CNRS, PSL University, Sorbonne Université, and Université de Paris, Paris, France.
PLoS Comput Biol. 2022 Jun 2;18(6):e1010167. doi: 10.1371/journal.pcbi.1010167. eCollection 2022 Jun.
Affinity maturation is crucial for improving the binding affinity of antibodies to antigens. This process is mainly driven by point substitutions caused by somatic hypermutations of the immunoglobulin gene. It also includes deletions and insertions of genomic material known as indels. While the landscape of point substitutions has been extensively studied, a detailed statistical description of indels is still lacking. Here we present a probabilistic inference tool to learn the statistics of indels from repertoire sequencing data, which overcomes the pitfalls and biases of standard annotation methods. The model includes antibody-specific maturation ages to account for variable mutational loads in the repertoire. After validation on synthetic data, we applied our tool to a large dataset of human immunoglobulin heavy chains. The inferred model allows us to identify universal statistical features of indels in heavy chains. We report distinct insertion and deletion hotspots, and show that the distribution of lengths of indels follows a geometric distribution, which puts constraints on future mechanistic models of the hypermutation process.
亲和力成熟对于提高抗体与抗原的结合亲和力至关重要。这个过程主要由免疫球蛋白基因的体细胞超突变引起的点突变驱动。它还包括基因组物质的缺失和插入,称为插入缺失。虽然点突变的情况已经得到了广泛的研究,但插入缺失的详细统计描述仍然缺乏。在这里,我们提出了一种概率推理工具,用于从库测序数据中学习插入缺失的统计信息,该工具克服了标准注释方法的缺陷和偏差。该模型包括抗体特异性成熟年龄,以解释库中可变的突变负荷。在对合成数据进行验证后,我们将我们的工具应用于人类免疫球蛋白重链的大型数据集。推断出的模型使我们能够识别重链中插入缺失的普遍统计特征。我们报告了独特的插入和缺失热点,并表明插入缺失的长度分布遵循几何分布,这对超突变过程的未来机制模型施加了限制。