Chari Tara, Gorin Gennady, Pachter Lior
Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California.
Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California.
bioRxiv. 2023 Sep 19:2023.09.17.558131. doi: 10.1101/2023.09.17.558131.
Multimodal, single-cell genomics technologies enable simultaneous capture of multiple facets of DNA and RNA processing in the cell. This creates opportunities for transcriptome-wide, mechanistic studies of cellular processing in heterogeneous cell types, with applications ranging from inferring kinetic differences between cells, to the role of stochasticity in driving heterogeneity. However, current methods for determining cell types or 'clusters' present in multimodal data often rely on ad hoc or independent treatment of modalities, and assumptions ignoring inherent properties of the count data. To enable interpretable and consistent cell cluster determination from multimodal data, we present meK-Means (mechanistic K-Means) which integrates modalities and learns underlying, shared biophysical states through a unifying model of transcription. In particular, we demonstrate how meK-Means can be used to cluster cells from unspliced and spliced mRNA count modalities. By utilizing the causal, physical relationships underlying these modalities, we identify shared transcriptional kinetics across cells, which induce the observed gene expression profiles, and provide an alternative definition for 'clusters' through the governing parameters of cellular processes.
多模态单细胞基因组学技术能够同时捕捉细胞中DNA和RNA加工的多个方面。这为在异质细胞类型中进行全转录组范围的细胞加工机制研究创造了机会,其应用范围从推断细胞间的动力学差异到随机性在驱动异质性中的作用。然而,当前用于确定多模态数据中存在的细胞类型或“簇”的方法通常依赖于对模态的临时或独立处理,以及忽略计数数据固有属性的假设。为了能够从多模态数据中进行可解释且一致的细胞簇确定,我们提出了meK-Means(机制K均值)方法,该方法整合模态并通过统一的转录模型学习潜在的共享生物物理状态。特别是,我们展示了meK-Means如何用于对来自未剪接和剪接mRNA计数模态的细胞进行聚类。通过利用这些模态背后的因果物理关系,我们识别出细胞间共享的转录动力学,这些动力学诱导了观察到的基因表达谱,并通过细胞过程的控制参数为“簇”提供了另一种定义。