基于语言模型的 B 细胞受体序列嵌入可以有效地编码受体特异性。

Language model-based B cell receptor sequence embeddings can effectively encode receptor specificity.

机构信息

Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.

Program in Applied Mathematics, Yale University, New Haven, CT, USA.

出版信息

Nucleic Acids Res. 2024 Jan 25;52(2):548-557. doi: 10.1093/nar/gkad1128.

DOI:10.1093/nar/gkad1128

PMID:38109302

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10810273/

Abstract

High throughput sequencing of B cell receptors (BCRs) is increasingly applied to study the immense diversity of antibodies. Learning biologically meaningful embeddings of BCR sequences is beneficial for predictive modeling. Several embedding methods have been developed for BCRs, but no direct performance benchmarking exists. Moreover, the impact of the input sequence length and paired-chain information on the prediction remains to be explored. We evaluated the performance of multiple embedding models to predict BCR sequence properties and receptor specificity. Despite the differences in model architectures, most embeddings effectively capture BCR sequence properties and specificity. BCR-specific embeddings slightly outperform general protein language models in predicting specificity. In addition, incorporating full-length heavy chains and paired light chain sequences improves the prediction performance of all embeddings. This study provides insights into the properties of BCR embeddings to improve downstream prediction applications for antibody analysis and discovery.

摘要

高通量测序 B 细胞受体 (BCR) 越来越多地应用于研究抗体的巨大多样性。学习 BCR 序列的生物学有意义的嵌入对于预测建模是有益的。已经开发了几种用于 BCR 的嵌入方法，但不存在直接的性能基准测试。此外，输入序列长度和配对链信息对预测的影响仍有待探索。我们评估了多种嵌入模型在预测 BCR 序列特性和受体特异性方面的性能。尽管模型架构存在差异，但大多数嵌入有效地捕获了 BCR 序列特性和特异性。在预测特异性方面，BCR 特异性嵌入略优于一般蛋白质语言模型。此外，包含全长重链和配对轻链序列可提高所有嵌入的预测性能。这项研究深入了解了 BCR 嵌入的特性，以改善抗体分析和发现的下游预测应用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于语言模型的 B 细胞受体序列嵌入可以有效地编码受体特异性。

Language model-based B cell receptor sequence embeddings can effectively encode receptor specificity.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于语言模型的 B 细胞受体序列嵌入可以有效地编码受体特异性。

Language model-based B cell receptor sequence embeddings can effectively encode receptor specificity.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献