Joseph Henry Laboratories, Princeton University, Princeton, NJ 08544, USA.
Proc Natl Acad Sci U S A. 2012 Oct 2;109(40):16161-6. doi: 10.1073/pnas.1212755109. Epub 2012 Sep 17.
Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.
体细胞胚系 V、D 和 J 基因的随机重排,为某些细胞表面受体创造了可变编码序列,这是免疫系统多样性的起源。这个过程被称为“VDJ 重组”,是通过一系列涉及基因选择和基因之间随机核苷酸插入和缺失的随机分子事件来实现的。我们使用人类 CD4+ T 细胞受体β链可变 CDR3 区的大量序列库,来推断这些基本生化事件的统计特性。由于任何给定的 CDR3 序列都可以以多种方式产生,因此隐藏的重组事件的概率分布不能直接从观察到的序列中推断出来;因此,我们开发了一种最大似然推断方法来实现这一目标。为了将分子重排机制的特性与选择的影响分开,我们专注于 T 细胞 DNA 中无功能的 CDR3 序列。我们推断了当一个新的 T 细胞受体基因产生时,各种生成事件的联合分布。我们发现了一个丰富的相关性(和不存在)图景,深入了解了所涉及的分子机制。个体之间的生成事件统计数据是一致的,这表明存在一种普遍的生化过程。我们的概率模型通过原始重组过程预测任何特定 CDR3 序列的生成概率,从而使我们能够量化 T 细胞库的潜在多样性,并理解为什么某些序列在个体之间共享。我们认为,使用本文中提出的形式统计推断方法对于定量理解适应性免疫系统中多样性的产生和进化将是至关重要的。