Jagota Milind, Hsu Chloe, Mazumder Thomas, Sung Kevin, DeWitt William S, Listgarten Jennifer, Matsen Frederick A, Ye Chun Jimmie, Song Yun S
Computer Science Division, UC Berkeley, Berkeley, CA USA.
Division of Rheumatology, Department of Medicine, UCSF, San Francisco, CA, USA.
bioRxiv. 2024 Oct 25:2024.10.22.619760. doi: 10.1101/2024.10.22.619760.
Antibodies and B-cell receptors (BCRs) are produced by B cells, and are built of a heavy chain and a light chain. Although each B cell could express two different heavy chains and four different light chains, usually only a unique pair of heavy chain and light chain is expressed-a phenomenon known as . However, a small fraction of naive-B cells violate allelic exclusion by expressing two productive light chains, one of which has impaired function; this has been called . We demonstrate that these B cells can be used to learn constraints on antibody sequence. Using large-scale single-cell sequencing data from humans, we find examples of light chain allelic inclusion in thousands of naive-B cells, which is an order of magnitude larger than existing datasets. We train machine learning models to identify the abnormal sequences in these cells. The resulting models correlate with antibody properties that they were not trained on, including polyreactivity, surface expression, and mutation usage in affinity maturation. These correlations are larger than what is achieved by existing antibody modeling approaches, indicating that allelic inclusion data contains useful new information. We also investigate the impact of similar selection forces on the heavy chain in mouse, and observe that pairing with the surrogate light chain significantly restricts heavy chain diversity.
抗体和B细胞受体(BCR)由B细胞产生,由一条重链和一条轻链组成。虽然每个B细胞可以表达两种不同的重链和四种不同的轻链,但通常只表达一对独特的重链和轻链——这种现象称为等位基因排斥。然而,一小部分未成熟B细胞通过表达两条有功能的轻链来违反等位基因排斥,其中一条轻链功能受损;这被称为轻链等位基因包含。我们证明这些B细胞可用于了解抗体序列的限制因素。利用来自人类的大规模单细胞测序数据,我们在数千个未成熟B细胞中发现了轻链等位基因包含的例子,这比现有数据集大一个数量级。我们训练机器学习模型来识别这些细胞中的异常序列。所得模型与它们未接受训练的抗体特性相关,包括多反应性、表面表达和亲和力成熟中的突变使用情况。这些相关性比现有抗体建模方法所达到的相关性更大,表明等位基因包含数据包含有用的新信息。我们还研究了类似选择力对小鼠重链的影响,并观察到与替代轻链配对会显著限制重链多样性。