Biodemography of Aging Research Unit, Social Science Research Institute, Duke University, Durham, NC, USA.
Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Commun Biol. 2021 May 26;4(1):629. doi: 10.1038/s42003-021-02146-6.
The increasing availability of single-cell data revolutionizes the understanding of biological mechanisms at cellular resolution. For differential expression analysis in multi-subject single-cell data, negative binomial mixed models account for both subject-level and cell-level overdispersions, but are computationally demanding. Here, we propose an efficient NEgative Binomial mixed model Using a Large-sample Approximation (NEBULA). The speed gain is achieved by analytically solving high-dimensional integrals instead of using the Laplace approximation. We demonstrate that NEBULA is orders of magnitude faster than existing tools and controls false-positive errors in marker gene identification and co-expression analysis. Using NEBULA in Alzheimer's disease cohort data sets, we found that the cell-level expression of APOE correlated with that of other genetic risk factors (including CLU, CST3, TREM2, C1q, and ITM2B) in a cell-type-specific pattern and an isoform-dependent manner in microglia. NEBULA opens up a new avenue for the broad application of mixed models to large-scale multi-subject single-cell data.
单细胞数据的日益普及彻底改变了我们对细胞分辨率下生物学机制的理解。对于多主体单细胞数据的差异表达分析,负二项混合模型可以同时考虑主体水平和细胞水平的过离散度,但计算量很大。在这里,我们提出了一种高效的负二项混合模型,即使用大样本逼近(NEBULA)。通过分析求解高维积分而不是使用拉普拉斯逼近,实现了速度上的提升。我们证明,NEBULA 比现有的工具快几个数量级,可以控制标记基因识别和共表达分析中的假阳性错误。在阿尔茨海默病队列数据集上使用 NEBULA,我们发现 APOE 在细胞水平的表达与其他遗传风险因素(包括 CLU、CST3、TREM2、C1q 和 ITM2B)在小胶质细胞中以细胞类型特异性和亚型依赖性的方式相关。NEBULA 为混合模型在大规模多主体单细胞数据中的广泛应用开辟了新途径。