Department of Integrative Biology and Physiology, University of California, Los Angeles, California, United States of America.
Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, California, United States of America.
PLoS Comput Biol. 2024 Sep 6;20(9):e1012386. doi: 10.1371/journal.pcbi.1012386. eCollection 2024 Sep.
Effective analysis of single-cell RNA sequencing (scRNA-seq) data requires a rigorous distinction between technical noise and biological variation. In this work, we propose a simple feature selection model, termed "Differentially Distributed Genes" or DDGs, where a binomial sampling process for each mRNA species produces a null model of technical variation. Using scRNA-seq data where cell identities have been established a priori, we find that the DDG model of biological variation outperforms existing methods. We demonstrate that DDGs distinguish a validated set of real biologically varying genes, minimize neighborhood distortion, and enable accurate partitioning of cells into their established cell-type groups.
有效分析单细胞 RNA 测序(scRNA-seq)数据需要严格区分技术噪声和生物变异。在这项工作中,我们提出了一种简单的特征选择模型,称为“差异分布基因”(Differentially Distributed Genes,DDGs),其中每个 mRNA 物种的二项式抽样过程产生了技术变异的零模型。使用预先确定细胞身份的 scRNA-seq 数据,我们发现 DDG 模型的生物变异优于现有方法。我们证明,DDGs 可以区分一组经过验证的真正具有生物学差异的基因,最小化邻域变形,并能够准确地将细胞划分为其已建立的细胞类型组。