Erlach Lena, Kuhn Raphael, Agrafiotis Andreas, Shlesinger Danielle, Yermanos Alexander, Reddy Sai T
Department of Biosystems Science and Engineering, ETH Zurich, 4057 Basel, Switzerland.
Department of Biosystems Science and Engineering, ETH Zurich, 4057 Basel, Switzerland; Institute of Microbiology, ETH Zurich, 8049 Zurich, Switzerland.
Cell Syst. 2024 Dec 18;15(12):1295-1303.e5. doi: 10.1016/j.cels.2024.11.005. Epub 2024 Dec 10.
The field of antibody discovery typically involves extensive experimental screening of B cells from immunized animals. Machine learning (ML)-guided prediction of antigen-specific B cells could accelerate this process but requires sufficient training data with antigen-specificity labeling. Here, we introduce a dataset of single-cell transcriptome and antibody repertoire sequencing of B cells from immunized mice, which are labeled as antigen specific or non-specific through experimental selections. We identify gene expression patterns associated with antigen specificity by differential gene expression analysis and assess their antibody sequence diversity. Subsequently, we benchmark various ML models, both linear and non-linear, trained on different combinations of gene expression and antibody repertoire features. Additionally, we assess transfer learning using features from general and antibody-specific protein language models (PLMs). Our findings show that gene expression-based models outperform sequence-based models for antigen-specificity predictions, highlighting a promising avenue for computationally guided antibody discovery.
抗体发现领域通常涉及对免疫动物的B细胞进行广泛的实验筛选。机器学习(ML)指导的抗原特异性B细胞预测可以加速这一过程,但需要带有抗原特异性标记的足够训练数据。在这里,我们引入了一个来自免疫小鼠的B细胞的单细胞转录组和抗体库测序数据集,这些B细胞通过实验选择被标记为抗原特异性或非特异性。我们通过差异基因表达分析确定与抗原特异性相关的基因表达模式,并评估它们的抗体序列多样性。随后,我们对基于不同基因表达和抗体库特征组合训练的各种线性和非线性ML模型进行基准测试。此外,我们使用来自通用和抗体特异性蛋白质语言模型(PLM)的特征评估迁移学习。我们的研究结果表明,基于基因表达的模型在抗原特异性预测方面优于基于序列的模型,为计算指导的抗体发现开辟了一条有前景的途径。