Abeysundera Melanie, Kenney Toby, Field Chris, Gu Hong
Department of Mathematics and Statistics, Dalhousie University, Halifax, Canada.
PLoS One. 2014 Apr 14;9(4):e94279. doi: 10.1371/journal.pone.0094279. eCollection 2014.
We present a simple and effective method for combining distance matrices from multiple genes on identical taxon sets to obtain a single representative distance matrix from which to derive a combined-gene phylogenetic tree. The method applies singular value decomposition (SVD) to extract the greatest common signal present in the distances obtained from each gene. The first right eigenvector of the SVD, which corresponds to a weighted average of the distance matrices of all genes, can thus be used to derive a representative tree from multiple genes. We apply our method to three well known data sets and estimate the uncertainty using bootstrap methods. Our results show that this method works well for these three data sets and that the uncertainty in these estimates is small. A simulation study is conducted to compare the performance of our method with several other distance based approaches (namely SDM, SDM* and ACS97), and we find the performances of all these approaches are comparable in the consensus setting. The computational complexity of our method is similar to that of SDM. Besides constructing a representative tree from multiple genes, we also demonstrate how the subsequent eigenvalues and eigenvectors may be used to identify if there are conflicting signals in the data and which genes might be influential or outliers for the estimated combined-gene tree.
我们提出了一种简单有效的方法,用于合并来自相同分类单元集上多个基因的距离矩阵,以获得单个代表性距离矩阵,从而推导出合并基因系统发育树。该方法应用奇异值分解(SVD)来提取从每个基因获得的距离中存在的最大共同信号。SVD的第一个右特征向量对应于所有基因距离矩阵的加权平均值,因此可用于从多个基因推导出代表性树。我们将我们的方法应用于三个著名的数据集,并使用自助法估计不确定性。我们的结果表明,该方法对这三个数据集效果良好,并且这些估计中的不确定性很小。进行了一项模拟研究,以将我们的方法与其他几种基于距离的方法(即SDM、SDM*和ACS97)的性能进行比较,我们发现在共识设置下所有这些方法的性能相当。我们方法的计算复杂度与SDM相似。除了从多个基因构建代表性树之外,我们还展示了后续的特征值和特征向量可如何用于识别数据中是否存在冲突信号,以及哪些基因可能对估计的合并基因树有影响或属于异常值。