Damti Yanir, Gronau Ilan, Moran Shlomo, Yavneh Irad
Computer Science department, Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel.
Efi Arazi School of Computer Science, The Herzliya Interdisciplinary Center (IDC), P.O.Box 167, Herzliya 46150, Israel.
J Theor Biol. 2018 Mar 7;440:88-99. doi: 10.1016/j.jtbi.2017.12.022. Epub 2017 Dec 23.
Distance-based methods for phylogenetic reconstruction are based on a two-step approach: first, pairwise distances are computed from DNA sequences associated with a given set of taxa, and then these distances are used to reconstruct the phylogenetic relationships between taxa. Because the estimated distances are based on finite sequences, they are inherently noisy, and this noise may result in reconstruction errors. Previous attempts to improve reconstruction accuracy focused either on improving the robustness of reconstruction algorithms to this stochastic noise, or on improving the accuracy of the distance estimates. Here, we aim to further improve reconstruction accuracy by utilizing the basic observation that reconstruction algorithms are based on a series of comparisons between distances (or linear combinations of distances). We start by examining the relationship between the stochastic noise in the sequence data and the accuracy of the comparisons between pairwise distance estimates. This examination results in improved methods for distance comparison, which are shown to be as accurate as likelihood-based methods, while being much simpler and more efficient to compute. We then extend these methods to improve reconstruction accuracy of quartet trees, and examine some of the challenges moving forward.
首先,从与给定分类单元集相关的DNA序列计算成对距离,然后使用这些距离重建分类单元之间的系统发育关系。由于估计的距离基于有限序列,它们本质上是有噪声的,并且这种噪声可能导致重建错误。以前提高重建准确性的尝试要么集中在提高重建算法对这种随机噪声的鲁棒性上,要么集中在提高距离估计的准确性上。在这里,我们旨在通过利用基本观察结果来进一步提高重建准确性,即重建算法基于距离之间的一系列比较(或距离的线性组合)。我们首先研究序列数据中的随机噪声与成对距离估计之间比较的准确性之间的关系。这种研究产生了改进的距离比较方法,这些方法被证明与基于似然的方法一样准确,同时计算起来要简单得多且效率更高。然后,我们扩展这些方法以提高四重树的重建准确性,并研究未来面临的一些挑战。