Kulman Ethan, Kuang Rui, Morris Quaid
Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America.
Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America.
PLoS Comput Biol. 2024 Dec 30;20(12):e1012653. doi: 10.1371/journal.pcbi.1012653. eCollection 2024 Dec.
Phylogenies depicting the evolutionary history of genetically heterogeneous subpopulations of cells from the same cancer, i.e., cancer phylogenies, offer valuable insights about cancer development and guide treatment strategies. Many methods exist that reconstruct cancer phylogenies using point mutations detected with bulk DNA sequencing. However, these methods become inaccurate when reconstructing phylogenies with more than 30 mutations, or, in some cases, fail to recover a phylogeny altogether. Here, we introduce Orchard, a cancer phylogeny reconstruction algorithm that is fast and accurate using up to 1000 mutations. Orchard samples without replacement from a factorized approximation of the posterior distribution over phylogenies, a novel result derived in this paper. Each factor in this approximate posterior corresponds to a conditional distribution for adding a new mutation to a partially built phylogeny. Orchard optimizes each factor sequentially, generating a sequence of incrementally larger phylogenies that ultimately culminate in a complete tree containing all mutations. Our evaluations demonstrate that Orchard outperforms state-of-the-art cancer phylogeny reconstruction methods in reconstructing more plausible phylogenies across 90 simulated cancers and 14 B-progenitor acute lymphoblastic leukemias (B-ALLs). Remarkably, Orchard accurately reconstructs cancer phylogenies using up to 1,000 mutations. Additionally, we demonstrate that the large and accurate phylogenies reconstructed by Orchard are useful for identifying patterns of somatic mutations and genetic variations among distinct cancer cell subpopulations.
描绘来自同一癌症的基因异质性细胞亚群进化历史的系统发育树,即癌症系统发育树,为癌症发展提供了有价值的见解并指导治疗策略。存在许多使用通过批量DNA测序检测到的点突变来重建癌症系统发育树的方法。然而,当重建具有超过30个突变的系统发育树时,这些方法会变得不准确,或者在某些情况下,根本无法恢复系统发育树。在这里,我们引入了Orchard,一种癌症系统发育树重建算法,它在使用多达1000个突变时快速且准确。Orchard从系统发育树后验分布的因式分解近似中进行无放回抽样,这是本文得出的一个新结果。这个近似后验中的每个因子对应于将一个新突变添加到部分构建的系统发育树的条件分布。Orchard依次优化每个因子,生成一系列逐渐增大的系统发育树,最终 culminate 为包含所有突变的完整树。我们的评估表明,在重建90个模拟癌症和14例B祖细胞急性淋巴细胞白血病(B-ALL)的更合理的系统发育树方面,Orchard优于现有最先进的癌症系统发育树重建方法。值得注意的是,Orchard使用多达1000个突变准确地重建癌症系统发育树。此外,我们证明,由Orchard重建的大型且准确的系统发育树对于识别不同癌细胞亚群之间的体细胞突变模式和基因变异是有用的。
原文中“culminate”此处可能有误,推测可能是“culminates”,翻译为“最终形成”更合适,但按照要求未修改原文直接翻译。