Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program in Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America; Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.
Intelligent Oncotherapeutics, Pittsburgh, Pennsylvania, United States of America.
PLoS Comput Biol. 2014 Jul 31;10(7):e1003740. doi: 10.1371/journal.pcbi.1003740. eCollection 2014 Jul.
We present methods to construct phylogenetic models of tumor progression at the cellular level that include copy number changes at the scale of single genes, entire chromosomes, and the whole genome. The methods are designed for data collected by fluorescence in situ hybridization (FISH), an experimental technique especially well suited to characterizing intratumor heterogeneity using counts of probes to genetic regions frequently gained or lost in tumor development. Here, we develop new provably optimal methods for computing an edit distance between the copy number states of two cells given evolution by copy number changes of single probes, all probes on a chromosome, or all probes in the genome. We then apply this theory to develop a practical heuristic algorithm, implemented in publicly available software, for inferring tumor phylogenies on data from potentially hundreds of single cells by this evolutionary model. We demonstrate and validate the methods on simulated data and published FISH data from cervical cancers and breast cancers. Our computational experiments show that the new model and algorithm lead to more parsimonious trees than prior methods for single-tumor phylogenetics and to improved performance on various classification tasks, such as distinguishing primary tumors from metastases obtained from the same patient population.
我们提出了在细胞水平上构建肿瘤进展系统发育模型的方法,这些模型包括单个基因、整个染色体和整个基因组的拷贝数变化。这些方法是为通过荧光原位杂交(FISH)收集的数据设计的,这是一种实验技术,特别适合使用遗传区域的探针计数来描述肿瘤发展过程中经常获得或丢失的肿瘤异质性。在这里,我们开发了新的、可证明最优的方法,用于计算给定单个探针、染色体上所有探针或基因组中所有探针的拷贝数变化后,两个细胞的拷贝数状态之间的编辑距离。然后,我们应用这一理论开发了一种实用的启发式算法,该算法基于该进化模型,通过潜在数百个单细胞的数据推断肿瘤系统发育。我们在模拟数据和来自宫颈癌和乳腺癌的已发表 FISH 数据上验证了这些方法。我们的计算实验表明,新模型和算法比以前的单肿瘤系统发育方法导致更简约的树,并在各种分类任务上表现更好,例如区分来自同一患者群体的原发肿瘤和转移瘤。