Department of Biomedical Data Science, Dartmouth College, Hanover, New Hampshire, United States of America.
PLoS Comput Biol. 2023 Aug 21;19(8):e1011413. doi: 10.1371/journal.pcbi.1011413. eCollection 2023 Aug.
The accurate estimation of cell surface receptor abundance for single cell transcriptomics data is important for the tasks of cell type and phenotype categorization and cell-cell interaction quantification. We previously developed an unsupervised receptor abundance estimation technique named SPECK (Surface Protein abundance Estimation using CKmeans-based clustered thresholding) to address the challenges associated with accurate abundance estimation. In that paper, we concluded that SPECK results in improved concordance with Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) data relative to comparative unsupervised abundance estimation techniques using only single-cell RNA-sequencing (scRNA-seq) data. In this paper, we outline a new supervised receptor abundance estimation method called STREAK (gene Set Testing-based Receptor abundance Estimation using Adjusted distances and cKmeans thresholding) that leverages associations learned from joint scRNA-seq/CITE-seq training data and a thresholded gene set scoring mechanism to estimate receptor abundance for scRNA-seq target data. We evaluate STREAK relative to both unsupervised and supervised receptor abundance estimation techniques using two evaluation approaches on six joint scRNA-seq/CITE-seq datasets that represent four human and mouse tissue types. We conclude that STREAK outperforms other abundance estimation strategies and provides a more biologically interpretable and transparent statistical model.
单细胞转录组学数据中细胞表面受体丰度的准确估计对于细胞类型和表型分类以及细胞间相互作用定量等任务非常重要。我们之前开发了一种名为 SPECK(基于 CKmeans 聚类的阈值的表面蛋白丰度估计)的无监督受体丰度估计技术,以解决准确丰度估计相关的挑战。在那篇论文中,我们得出结论,与仅使用单细胞 RNA-seq(scRNA-seq)数据的比较无监督丰度估计技术相比,SPECK 结果与基于转录组和表位测序的细胞索引(CITE-seq)数据的一致性得到了提高。在本文中,我们概述了一种名为 STREAK(基于基因集测试的受体丰度估计,使用调整后的距离和 cKmeans 阈值)的新的有监督受体丰度估计方法,该方法利用从联合 scRNA-seq/CITE-seq 训练数据中学习到的关联以及阈值基因集评分机制来估计 scRNA-seq 目标数据中的受体丰度。我们使用两种评估方法在六个代表四种人类和小鼠组织类型的联合 scRNA-seq/CITE-seq 数据集上对 STREAK 进行了评估,这些评估方法涉及无监督和有监督的受体丰度估计技术。我们得出结论,STREAK 优于其他丰度估计策略,并提供了更具生物学可解释性和透明度的统计模型。