Hong Yan, Guo Maozu, Wang Juan
School of Computer Science, Inner Mongolia University, Hohhot 010021, P.R. China.
School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, P.R. China.
Mol Ther Nucleic Acids. 2020 Nov 11;23:286-293. doi: 10.1016/j.omtn.2020.11.004. eCollection 2021 Mar 5.
Phylogenetic analysis is used to analyze the evolution of species according to the characteristics of biological sequences. The analytical results are generally represented by phylogenetic trees. NJ (neighbor joining) is a frequently used algorithm for constructing phylogenetic trees because of its few assumptions, fast operation, and high accuracy, and is based on the distance between taxa. It is known that NJ usually constructs different phylogenetic trees for the same dataset with differences in input order, which are known as "tied trees." This article proposes an improved method of NJ, called ENJ (extended neighbor joining). The ENJ can join several (currently limited to three) nodes with the same minimum distance into a new node, rather than joining two nodes in one iteration, so it can construct triple phylogenetic trees. We have inferred the formulas for updating the distance values and calculating the branch lengths for the ENJ algorithm. We have tested the ENJ with simulated and real data. The experimental results show that, compared with other methods, the trees constructed by the ENJ have greater similarity to the initial trees, and the ENJ is much faster than the NJ algorithm. Moreover, we have constructed a phylogenetic tree for the novel coronavirus (COVID-19) and related coronaviruses by ENJ, which shows that COVID-19 and SARS-CoV are closer than other coronaviruses. Because it differs from the existing phylogenetic trees for those coronaviruses, we constructed a phylogenetic network for them. The network shows those species have had a reticulate evolution.
系统发育分析用于根据生物序列的特征分析物种的进化。分析结果通常用系统发育树来表示。NJ(邻接法)是一种常用的构建系统发育树的算法,因其假设少、运算速度快且准确性高,并且基于分类单元之间的距离。已知NJ通常会因输入顺序的不同为同一数据集构建不同的系统发育树,这些树被称为“束缚树”。本文提出了一种改进的NJ方法,称为ENJ(扩展邻接法)。ENJ可以将具有相同最小距离的几个(目前限于三个)节点合并为一个新节点,而不是在一次迭代中合并两个节点,因此它可以构建三元系统发育树。我们推导了ENJ算法更新距离值和计算分支长度的公式。我们用模拟数据和真实数据对ENJ进行了测试。实验结果表明,与其他方法相比,ENJ构建的树与初始树具有更高的相似度,并且ENJ比NJ算法快得多。此外,我们用ENJ为新型冠状病毒(COVID-19)和相关冠状病毒构建了系统发育树,结果表明COVID-19与严重急性呼吸综合征冠状病毒(SARS-CoV)的关系比与其他冠状病毒的关系更密切。由于它与那些冠状病毒现有的系统发育树不同,我们为它们构建了一个系统发育网络。该网络表明这些物种经历了网状进化。