Schmidt Henri, Qi Yuanyuan, Raphael Benjamin J, El-Kebir Mohammed
Department of Computer Science, Princeton University, Princeton, NJ 08542, United States.
Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i170-i179. doi: 10.1093/bioinformatics/btaf235.
Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have made significant progress by breaking the reconstruction problem into two parts: a regression problem over a fixed topology and a search over tree space. While effective techniques have been developed for the latter search problem, the regression problem remains a bottleneck in both method design and implementation due to the lack of fast, specialized algorithms.
Here, we introduce fastppm, a fast tool to solve the perfect phylogeny regression problem via tree-structured dual dynamic programming. fastppm supports arbitrary, separable convex loss functions including the ℓ2, piecewise linear, binomial and beta-binomial loss and provides asymptotic improvements for the ℓ2 and piecewise linear loss over existing algorithms. We find that fastppm empirically outperforms both specialized and general purpose regression algorithms, obtaining 50-450× speedups while providing as accurate solutions as existing approaches. Incorporating fastppm into several phylogeny inference algorithms immediately yields up to 400× speedups, requiring only a small change to the program code of existing software. Finally, fastppm enables analysis of low-coverage bulk DNA sequencing data on both simulated data and in a patient-derived mouse model of colorectal cancer, outperforming state-of-the-art phylogeny inference algorithms in terms of both accuracy and runtime.
fastppm is implemented in C++ and available as both a command-line interface and Python library at github.com/elkebir-group/fastppm.git under an MIT license.
从多个组织样本的大量DNA测序中重建肿瘤的进化史仍然是一个具有挑战性的计算问题,需要同时对肿瘤组织进行反卷积并推断其进化史。最近,系统发育重建方法通过将重建问题分解为两部分取得了重大进展:在固定拓扑上的回归问题和在树空间中的搜索。虽然已经为后一个搜索问题开发了有效的技术,但由于缺乏快速、专门的算法,回归问题在方法设计和实现中仍然是一个瓶颈。
在这里,我们介绍了fastppm,一种通过树结构对偶动态规划来解决完美系统发育回归问题的快速工具。fastppm支持任意的、可分离的凸损失函数,包括ℓ2、分段线性、二项式和β-二项式损失,并在ℓ2和分段线性损失方面相对于现有算法提供了渐近改进。我们发现,经验上fastppm优于专门的和通用的回归算法,在提供与现有方法一样准确的解决方案的同时,实现了50至450倍的加速。将fastppm纳入几种系统发育推断算法立即产生高达400倍的加速,只需要对现有软件的程序代码进行微小更改。最后,fastppm能够在模拟数据和患者来源的结直肠癌小鼠模型上分析低覆盖度的大量DNA测序数据,在准确性和运行时间方面都优于最先进的系统发育推断算法。
fastppm用C++实现,可在github.com/elkebir-group/fastppm.git上作为命令行界面和Python库获得,遵循MIT许可。