Suppr超能文献

用于从单细胞数据中识别抗体-抗原特异性预测中的噪声的负二项混合模型。

Negative binomial mixture model for identification of noise in antibody-antigen specificity predictions from single-cell data.

作者信息

Wasdin Perry T, Abu-Shmais Alexandra A, Irvin Michael W, Vukovich Matthew J, Georgiev Ivelin S

机构信息

Program in Chemical and Physical Biology, Vanderbilt University Medical Center, Nashville, TN, 37232, United States.

Center for Computational Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, 37232, United States.

出版信息

Bioinform Adv. 2024 Dec 4;4(1):vbae170. doi: 10.1093/bioadv/vbae170. eCollection 2024.

Abstract

MOTIVATION

LIBRA-seq (linking B cell receptor to antigen specificity by sequencing) provides a powerful tool for interrogating the antigen-specific B cell compartment and identifying antibodies against antigen targets of interest. Identification of noise in single-cell B cell receptor sequencing data, such as LIBRA-seq, is critical for improving antigen binding predictions for downstream applications including antibody discovery and machine learning technologies.

RESULTS

In this study, we present a method for denoising LIBRA-seq data by clustering antigen counts into signal and noise components with a negative binomial mixture model. This approach leverages single-cell sequencing reads from a large, multi-donor dataset described in a recent LIBRA-seq study to develop a data-driven means for identification of technical noise. We apply this method to nine donors representing separate LIBRA-seq experiments and show that our approach provides improved predictions for antibody-antigen binding when compared to the standard scoring method, despite variance in data size and noise structure across samples. This development will improve the ability of LIBRA-seq to identify antigen-specific B cells and contribute to providing more reliable datasets for machine learning based approaches as the corpus of single-cell B cell sequencing data continues to grow.

AVAILABILITY AND IMPLEMENTATION

All data and code are available at https://github.com/IGlab-VUMC/mixture_model_denoising.

摘要

动机

LIBRA-seq(通过测序将B细胞受体与抗原特异性联系起来)为研究抗原特异性B细胞区室和鉴定针对感兴趣抗原靶点的抗体提供了一个强大的工具。识别单细胞B细胞受体测序数据中的噪声,如LIBRA-seq中的噪声,对于改进包括抗体发现和机器学习技术在内的下游应用的抗原结合预测至关重要。

结果

在本研究中,我们提出了一种通过使用负二项混合模型将抗原计数聚类为信号和噪声成分来对LIBRA-seq数据进行去噪的方法。这种方法利用了来自最近一项LIBRA-seq研究中描述的一个大型多供体数据集的单细胞测序读数,以开发一种数据驱动的技术噪声识别方法。我们将这种方法应用于代表不同LIBRA-seq实验的九个供体,并表明与标准评分方法相比,我们的方法在抗体-抗原结合预测方面有改进,尽管样本间数据大小和噪声结构存在差异。随着单细胞B细胞测序数据量的不断增加,这一进展将提高LIBRA-seq识别抗原特异性B细胞的能力,并有助于为基于机器学习的方法提供更可靠的数据集。

可用性和实现方式

所有数据和代码可在https://github.com/IGlab-VUMC/mixture_model_denoising获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f9f1/11631427/b44ec2c13ba6/vbae170f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验