Ahlmann-Eltze Constantin, Huber Wolfgang
Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany.
Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.
Nat Genet. 2025 Mar;57(3):659-667. doi: 10.1038/s41588-024-01996-0. Epub 2025 Jan 3.
Identifying gene expression differences in heterogeneous tissues across conditions is a fundamental biological task, enabled by multi-condition single-cell RNA sequencing (RNA-seq). Current data analysis approaches divide the constituent cells into clusters meant to represent cell types, but such discrete categorization tends to be an unsatisfactory model of the underlying biology. Here, we introduce latent embedding multivariate regression (LEMUR), a model that operates without, or before, commitment to discrete categorization. LEMUR (1) integrates data from different conditions, (2) predicts each cell's gene expression changes as a function of the conditions and its position in latent space and (3) for each gene, identifies a compact neighborhood of cells with consistent differential expression. We apply LEMUR to cancer, zebrafish development and spatial gradients in Alzheimer's disease, demonstrating its broad applicability.
通过多条件单细胞RNA测序(RNA-seq)能够实现识别不同条件下异质组织中的基因表达差异,这是一项基本的生物学任务。当前的数据分析方法将组成细胞划分为旨在代表细胞类型的簇,但这种离散分类往往是对基础生物学的一种不令人满意的模型。在这里,我们引入了潜在嵌入多元回归(LEMUR),这是一种在不进行或在进行离散分类之前运行的模型。LEMUR(1)整合来自不同条件的数据,(2)根据条件及其在潜在空间中的位置预测每个细胞的基因表达变化,并且(3)对于每个基因,识别具有一致差异表达的细胞的紧凑邻域。我们将LEMUR应用于癌症、斑马鱼发育和阿尔茨海默病的空间梯度研究,证明了其广泛的适用性。