Biological and Medical Informatics Graduate Program, University of California San Francisco, San Francisco, CA, USA.
Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA.
Genome Biol. 2020 Apr 14;21(1):92. doi: 10.1186/s13059-020-02000-8.
The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia.
基于 CRISPR/Cas9 的基因编辑与大规模并行单细胞读出技术的结合,现在能够实现大规模谱系追踪。然而,这些检测方法产生的数据复杂性迅速增加,超出了我们准确推断系统发育关系的能力。首先,我们引入了 Cassiopeia——一套用于树重建的可扩展最大简约方法。其次,我们提供了一个模拟框架,用于评估算法和探索谱系示踪剂设计原则。最后,我们生成了迄今为止最复杂的实验谱系追踪数据集,对 34557 个人类细胞进行了连续 15 代的追踪,并将其用于基准测试系统发育推断方法。我们表明,Cassiopeia 在多个指标上优于传统方法,并在各种参数条件下表现良好,为设计改进的 Cas9 增强记录器提供了见解。总之,这些方法应该广泛适用于大规模哺乳动物谱系追踪工作。Cassiopeia 及其基准资源可在 www.github.com/YosefLab/Cassiopeia 上公开获取。