Feng Jean, Dewitt William S, McKenna Aaron, Simon Noah, Willis Amy D, Matsen Frederick A
Department of Epidemiology and Biostatistics, University of California, San Francisco.
Department of Genome Sciences, University of Washington.
Ann Appl Stat. 2021 Mar;15(1):343-362. doi: 10.1214/20-aoas1400. Epub 2021 Mar 18.
CRISPR technology has enabled cell lineage tracing for complex multicellular organisms through insertion-deletion mutations of synthetic genomic barcodes during organismal development. To reconstruct the cell lineage tree from the mutated barcodes, current approaches apply general-purpose computational tools that are agnostic to the mutation process and are unable to take full advantage of the data's structure. We propose a statistical model for the CRISPR mutation process and develop a procedure to estimate the resulting tree topology, branch lengths, and mutation parameters by iteratively applying penalized maximum likelihood estimation. By assuming the barcode evolves according to a molecular clock, our method infers relative ordering across parallel lineages, whereas existing techniques only infer ordering for nodes along the same lineage. When analyzing transgenic zebrafish data from McKenna, Findlay and Gagnon et al. (2016), we find that our method recapitulates known aspects of zebrafish development and the results are consistent across samples.
通过在生物体发育过程中对合成基因组条形码进行插入-缺失突变,CRISPR技术已实现对复杂多细胞生物体的细胞谱系追踪。为了从突变的条形码中重建细胞谱系树,当前方法应用的是通用计算工具,这些工具对突变过程不了解,并且无法充分利用数据的结构。我们提出了一种CRISPR突变过程的统计模型,并开发了一种程序,通过迭代应用惩罚最大似然估计来估计所得的树拓扑结构、分支长度和突变参数。通过假设条形码根据分子钟进化,我们的方法推断平行谱系之间的相对顺序,而现有技术仅推断沿同一路径节点的顺序。在分析来自麦肯纳、芬德利和加尼翁等人(2016年)的转基因斑马鱼数据时,我们发现我们的方法概括了斑马鱼发育的已知方面,并且结果在不同样本中是一致的。