Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 74720.
Department of Statistics, University of Michigan, Ann Arbor, MI 48109.
Proc Natl Acad Sci U S A. 2023 May 23;120(21):e2209124120. doi: 10.1073/pnas.2209124120. Epub 2023 May 16.
Detecting differentially expressed genes is important for characterizing subpopulations of cells. In scRNA-seq data, however, nuisance variation due to technical factors like sequencing depth and RNA capture efficiency obscures the underlying biological signal. Deep generative models have been extensively applied to scRNA-seq data, with a special focus on embedding cells into a low-dimensional latent space and correcting for batch effects. However, little attention has been paid to the problem of utilizing the uncertainty from the deep generative model for differential expression (DE). Furthermore, the existing approaches do not allow for controlling for effect size or the false discovery rate (FDR). Here, we present lvm-DE, a generic Bayesian approach for performing DE predictions from a fitted deep generative model, while controlling the FDR. We apply the lvm-DE framework to scVI and scSphere, two deep generative models. The resulting approaches outperform state-of-the-art methods at estimating the log fold change in gene expression levels as well as detecting differentially expressed genes between subpopulations of cells.
检测差异表达基因对于描述细胞亚群特征非常重要。然而,在 scRNA-seq 数据中,由于测序深度和 RNA 捕获效率等技术因素引起的混杂变异掩盖了潜在的生物学信号。深度生成模型已经被广泛应用于 scRNA-seq 数据,特别关注将细胞嵌入低维潜在空间,并校正批次效应。然而,对于利用深度生成模型的不确定性进行差异表达(DE)分析的问题,关注较少。此外,现有的方法不允许控制效应大小或假发现率(FDR)。在这里,我们提出了 lvm-DE,这是一种从拟合的深度生成模型中进行 DE 预测的通用贝叶斯方法,同时控制 FDR。我们将 lvm-DE 框架应用于 scVI 和 scSphere 这两种深度生成模型。结果表明,与最先进的方法相比,这些方法在估计基因表达水平的对数倍数变化以及检测细胞亚群之间的差异表达基因方面表现更好。