Marghi Yeganeh, Gala Rohan, Baftizadeh Fahimeh, Sümbül Uygar
Allen Institute, 615 Westlake Ave N, Seattle, WA, USA.
Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA.
bioRxiv. 2024 Jul 2:2023.10.02.560574. doi: 10.1101/2023.10.02.560574.
Reproducible definition and identification of cell types is essential to enable investigations into their biological function, and understanding their relevance in the context of development, disease and evolution. Current approaches model variability in data as continuous latent factors, followed by clustering as a separate step, or immediately apply clustering on the data. We show that such approaches can suffer from qualitative mistakes in identifying cell types robustly, particularly when the number of such cell types is in the hundreds or even thousands. Here, we propose an unsupervised method, MMIDAS, which combines a generalized mixture model with a multi-armed deep neural network, to jointly infer the discrete type and continuous type-specific variability. Using four recent datasets of brain cells spanning different technologies, species, and conditions, we demonstrate that MMIDAS can identify reproducible cell types and infer cell type-dependent continuous variability in both uni-modal and multi-modal datasets.
可重复地定义和识别细胞类型对于开展对其生物学功能的研究,并理解它们在发育、疾病和进化背景下的相关性至关重要。当前的方法将数据中的变异性建模为连续的潜在因素,随后作为单独的步骤进行聚类,或者直接对数据应用聚类。我们表明,此类方法在稳健识别细胞类型时可能会出现定性错误,尤其是当此类细胞类型的数量达到数百甚至数千时。在此,我们提出一种无监督方法MMIDAS,它将广义混合模型与多臂深度神经网络相结合,以联合推断离散类型和特定于连续类型的变异性。使用跨越不同技术、物种和条件的四个近期脑细胞数据集,我们证明MMIDAS可以在单模态和多模态数据集中识别可重复的细胞类型,并推断细胞类型依赖性连续变异性。