Yu Zhenhua, Liu Huidong, Du Fang, Tang Xiaofen
School of Information Engineering, Ningxia University, Yinchuan, China.
Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Ningxia University, Yinchuan, China.
Front Genet. 2021 Jun 4;12:692964. doi: 10.3389/fgene.2021.692964. eCollection 2021.
Single-cell sequencing (SCS) now promises the landscape of genetic diversity at single cell level, and is particularly useful to reconstruct the evolutionary history of tumor. There are multiple types of noise that make the SCS data notoriously error-prone, and significantly complicate tumor tree reconstruction. Existing methods for tumor phylogeny estimation suffer from either high computational intensity or low-resolution indication of clonal architecture, giving a necessity of developing new methods for efficient and accurate reconstruction of tumor trees. We introduce GRMT (Generative Reconstruction of Mutation Tree from scratch), a method for inferring tumor mutation tree from SCS data. GRMT exploits the -Dollo parsimony model to allow each mutation to be gained once and lost at most times. Under this constraint on mutation evolution, GRMT searches for mutation tree structures from a perspective of tree generation from scratch, and implements it to an iterative process that gradually increases the tree size by introducing a new mutation per time until a complete tree structure that contains all mutations is obtained. This enables GRMT to efficiently recover the chronological order of mutations and scale well to large datasets. Extensive evaluations on simulated and real datasets suggest GRMT outperforms the state-of-the-arts in multiple performance metrics. The GRMT software is freely available at https://github.com/qasimyu/grmt.
单细胞测序(SCS)如今有望揭示单细胞水平上的遗传多样性景观,对重建肿瘤的进化史尤为有用。存在多种类型的噪声,使得SCS数据极易出错,并显著复杂化肿瘤树重建。现有的肿瘤系统发育估计方法要么计算强度高,要么对克隆结构的分辨率低,因此有必要开发新方法来高效且准确地重建肿瘤树。我们介绍了GRMT(从头开始生成突变树),一种从SCS数据推断肿瘤突变树的方法。GRMT利用多洛简约模型,使每个突变最多获得一次且最多丢失一次。在这种对突变进化的约束下,GRMT从从头生成树的角度搜索突变树结构,并将其实现为一个迭代过程,即每次通过引入一个新突变逐渐增加树的大小,直到获得包含所有突变的完整树结构。这使得GRMT能够有效地恢复突变的时间顺序,并能很好地扩展到大型数据集。对模拟和真实数据集的广泛评估表明,GRMT在多个性能指标上优于现有技术。GRMT软件可在https://github.com/qasimyu/grmt上免费获取。