Wasdin Perry T, Abu-Shmais Alexandra A, Irvin Michael W, Vukovich Matthew J, Georgiev Ivelin S
Program in Chemical and Physical Biology, Vanderbilt University Medical Center, Nashville, TN, 37232, United States.
Center for Computational Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, 37232, United States.
Bioinform Adv. 2024 Dec 4;4(1):vbae170. doi: 10.1093/bioadv/vbae170. eCollection 2024.
LIBRA-seq (linking B cell receptor to antigen specificity by sequencing) provides a powerful tool for interrogating the antigen-specific B cell compartment and identifying antibodies against antigen targets of interest. Identification of noise in single-cell B cell receptor sequencing data, such as LIBRA-seq, is critical for improving antigen binding predictions for downstream applications including antibody discovery and machine learning technologies.
In this study, we present a method for denoising LIBRA-seq data by clustering antigen counts into signal and noise components with a negative binomial mixture model. This approach leverages single-cell sequencing reads from a large, multi-donor dataset described in a recent LIBRA-seq study to develop a data-driven means for identification of technical noise. We apply this method to nine donors representing separate LIBRA-seq experiments and show that our approach provides improved predictions for antibody-antigen binding when compared to the standard scoring method, despite variance in data size and noise structure across samples. This development will improve the ability of LIBRA-seq to identify antigen-specific B cells and contribute to providing more reliable datasets for machine learning based approaches as the corpus of single-cell B cell sequencing data continues to grow.
All data and code are available at https://github.com/IGlab-VUMC/mixture_model_denoising.
LIBRA-seq(通过测序将B细胞受体与抗原特异性联系起来)为研究抗原特异性B细胞区室和鉴定针对感兴趣抗原靶点的抗体提供了一个强大的工具。识别单细胞B细胞受体测序数据中的噪声,如LIBRA-seq中的噪声,对于改进包括抗体发现和机器学习技术在内的下游应用的抗原结合预测至关重要。
在本研究中,我们提出了一种通过使用负二项混合模型将抗原计数聚类为信号和噪声成分来对LIBRA-seq数据进行去噪的方法。这种方法利用了来自最近一项LIBRA-seq研究中描述的一个大型多供体数据集的单细胞测序读数,以开发一种数据驱动的技术噪声识别方法。我们将这种方法应用于代表不同LIBRA-seq实验的九个供体,并表明与标准评分方法相比,我们的方法在抗体-抗原结合预测方面有改进,尽管样本间数据大小和噪声结构存在差异。随着单细胞B细胞测序数据量的不断增加,这一进展将提高LIBRA-seq识别抗原特异性B细胞的能力,并有助于为基于机器学习的方法提供更可靠的数据集。
所有数据和代码可在https://github.com/IGlab-VUMC/mixture_model_denoising获取。