Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08540, USA.
Bioinformatics. 2024 Jun 28;40(Suppl 1):i228-i236. doi: 10.1093/bioinformatics/btae221.
Recently developed spatial lineage tracing technologies induce somatic mutations at specific genomic loci in a population of growing cells and then measure these mutations in the sampled cells along with the physical locations of the cells. These technologies enable high-throughput studies of developmental processes over space and time. However, these applications rely on accurate reconstruction of a spatial cell lineage tree describing both past cell divisions and cell locations. Spatial lineage trees are related to phylogeographic models that have been well-studied in the phylogenetics literature. We demonstrate that standard phylogeographic models based on Brownian motion are inadequate to describe the spatial symmetric displacement (SD) of cells during cell division.
We introduce a new model-the SD model for cell motility that includes symmetric displacements of daughter cells from the parental cell followed by independent diffusion of daughter cells. We show that this model more accurately describes the locations of cells in a real spatial lineage tracing of mouse embryonic stem cells. Combining the spatial SD model with an evolutionary model of DNA mutations, we obtain a phylogeographic model for spatial lineage tracing. Using this model, we devise a maximum likelihood framework-MOLLUSC (Maximum Likelihood Estimation Of Lineage and Location Using Single-Cell Spatial Lineage tracing Data)-to co-estimate time-resolved branch lengths, spatial diffusion rate, and mutation rate. On both simulated and real data, we show that MOLLUSC accurately estimates all parameters. In contrast, the Brownian motion model overestimates spatial diffusion rate in all test cases. In addition, the inclusion of spatial information improves accuracy of branch length estimation compared to sequence data alone. On real data, we show that spatial information has more signal than sequence data for branch length estimation, suggesting augmenting lineage tracing technologies with spatial information is useful to overcome the limitations of genome-editing in developmental systems.
The python implementation of MOLLUSC is available at https://github.com/raphael-group/MOLLUSC.
最近开发的空间谱系追踪技术在不断生长的细胞群体中诱导特定基因组位置的体细胞突变,然后在采样细胞中测量这些突变以及细胞的物理位置。这些技术能够实现对空间和时间上的发育过程进行高通量研究。然而,这些应用依赖于对描述过去细胞分裂和细胞位置的空间细胞谱系树的精确重建。空间谱系树与系统地理学模型有关,系统地理学模型在系统发生学文献中已经得到了很好的研究。我们证明,基于布朗运动的标准系统地理学模型不足以描述细胞分裂过程中细胞的对称位移(SD)。
我们引入了一个新的模型——细胞迁移的 SD 模型,该模型包括子细胞相对于亲代细胞的对称位移,然后是子细胞的独立扩散。我们表明,该模型更准确地描述了在真实的空间谱系追踪实验中胚胎干细胞的细胞位置。将空间 SD 模型与 DNA 突变的进化模型相结合,我们得到了一个空间谱系追踪的系统地理学模型。使用该模型,我们设计了一个最大似然框架——MOLLUSC(使用单细胞空间谱系追踪数据进行谱系和位置的最大似然估计),以共同估计时间分辨的分支长度、空间扩散率和突变率。在模拟和真实数据上的结果表明,MOLLUSC 能够准确地估计所有参数。相比之下,布朗运动模型在所有测试案例中都高估了空间扩散率。此外,与仅使用序列数据相比,包含空间信息可以提高分支长度估计的准确性。在真实数据上的结果表明,空间信息比序列数据更有利于分支长度估计,这表明在发育系统中用空间信息来增强谱系追踪技术有助于克服基因组编辑的局限性。
MOLLUSC 的 Python 实现可在 https://github.com/raphael-group/MOLLUSC 上获得。