Bhattacharyya Sourya, Mukherjee Jayanta
Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, WB, 721302, India.
J Mol Evol. 2017 Aug;85(1-2):57-78. doi: 10.1007/s00239-017-9807-7. Epub 2017 Aug 23.
We propose an extension of the distance matrix methods NJst and ASTRID to infer species trees from incongruent gene trees having Incomplete Lineage Sorting. Both approaches consider the average internode distance (ID) between individual taxa pairs as the distance measure. The measure ID does not use the root of a tree, and thus may not always infer the relative position of a taxon with respect to the root. We define a novel distance measure excess gene leaf count (XL) between individual couplets. The XL measure is computed using the root of a tree. It is proved to be additive, and is shown to infer the relative order of divergence among individual couplets better. We propose a novel method IDXL which uses both the XL and ID measures for species tree construction. IDXL is shown to perform better than NJst and other distance matrix approaches for most of the biological and simulated datasets. Having the same computational complexity as NJst, IDXL can be applied for species tree inference on large-scale biological datasets.
我们提出了距离矩阵方法NJst和ASTRID的扩展,以从不一致的基因树中推断具有不完全谱系分选的物种树。这两种方法都将各个分类单元对之间的平均节间距离(ID)作为距离度量。ID度量不使用树的根,因此可能并不总是能推断出一个分类单元相对于根的相对位置。我们定义了一种新的各个二元组之间的距离度量——多余基因叶计数(XL)。XL度量是使用树的根来计算的。它被证明是可加性的,并且显示出能更好地推断各个二元组之间的分化相对顺序。我们提出了一种新方法IDXL,它在构建物种树时同时使用XL和ID度量。对于大多数生物和模拟数据集,IDXL的表现优于NJst和其他距离矩阵方法。IDXL与NJst具有相同的计算复杂度,可应用于大规模生物数据集的物种树推断。