Ghosh Tusharkanti, Baxter Ryan M, Seal Souvik, Lui Victor G, Rudra Pratyaydipta, Vu Thao, Hsieh Elena W Y, Ghosh Debashis
Department of Biostatistics & Informatics, Colorado School of Public Health, University of Colorado, Anschutz Medical Campus, Aurora, CO 80045, United States.
Department of Immunology and Microbiology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States.
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf399.
High-throughput sequencing of single-cell data can be used to rigorously evaluate cell specification and enable intricate variations between groups or conditions to be identified. Many popular existing methods for differential expression target differences in aggregate measurement (mean, median, sum) and limit their approaches to detect only global differential changes.
We present a robust method for differential expression of single-cell data using a kernel-based score test, cytoKernel. CytoKernel is specifically designed to assess the differential expression of single-cell RNA sequencing and high-dimensional flow or mass cytometry data using the full probability distribution pattern. cytoKernel is based on kernel embeddings which employs the probability distributions of the single-cell data, by calculating the pairwise divergence/distance between distributions of subjects. It can detect both patterns involving changes in the aggregate, as well as more elusive variations that are often overlooked due to the multimodal characteristics of single-cell data. We performed extensive benchmarks across both simulated and real data sets from mass cytometry data and single-cell RNA sequencing. The cytoKernel procedure effectively controls the false discovery rate and shows favorable performance compared to existing methods. The method is able to identify more differential patterns than existing approaches. We apply cytoKernel to assess gene expression and protein marker expression differences from cell subpopulations in various publicly available single-cell RNAseq and mass cytometry datasets.
The methods described in this paper are implemented in the open-source R package cytoKernel, which is freely available from Bioconductor at http://bioconductor.org/packages/cytoKernel.
单细胞数据的高通量测序可用于严格评估细胞特化,并能够识别不同组或条件之间的复杂差异。许多现有的流行差异表达方法以总体测量值(均值、中位数、总和)为目标差异,并将其检测方法限制为仅检测全局差异变化。
我们提出了一种基于核分数检验的单细胞数据差异表达稳健方法——细胞内核(cytoKernel)。细胞内核专门设计用于使用全概率分布模式评估单细胞RNA测序以及高维流式或质谱细胞术数据的差异表达。细胞内核基于核嵌入,通过计算样本分布之间的成对散度/距离,利用单细胞数据的概率分布。它既能检测涉及总体变化的模式,也能检测由于单细胞数据的多峰特征而经常被忽视的更难以捉摸的变化。我们对来自质谱细胞术数据和单细胞RNA测序的模拟数据集和真实数据集进行了广泛的基准测试。细胞内核程序有效地控制了错误发现率,与现有方法相比表现良好。该方法能够识别比现有方法更多的差异模式。我们应用细胞内核来评估各种公开可用的单细胞RNA测序和质谱细胞术数据集中细胞亚群的基因表达和蛋白质标志物表达差异。
本文所述方法在开源R包细胞内核中实现,可从Bioconductor网站(http://bioconductor.org/packages/cytoKernel)免费获取。