Arnaoudova Elissaveta, Haws David C, Huggins Peter, Jaromczyk Jerzy W, Moore Neil, Schardl Christopher L, Yoshida Ruriko
Department of Computer Science, University of Kentucky Lexington, KY, USA.
Front Neurosci. 2010 Aug 3;4. doi: 10.3389/fnins.2010.00047. eCollection 2010.
We propose a statistical method to test whether two phylogenetic trees with given alignments are significantly incongruent. Our method compares the two distributions of phylogenetic trees given by two input alignments, instead of comparing point estimations of trees. This statistical approach can be applied to gene tree analysis for example, detecting unusual events in genome evolution such as horizontal gene transfer and reshuffling. Our method uses difference of means to compare two distributions of trees, after mapping trees into a vector space. Bootstrapping alignment columns can then be applied to obtain p-values. To compute distances between means, we employ a "kernel method" which speeds up distance calculations when trees are mapped in a high-dimensional feature space, e.g., splits or quartets feature space. In this pilot study, first we test our statistical method on data sets simulated under a coalescence model, to test whether two alignments are generated by congruent gene trees. We follow our simulation results with applications to data sets of gophers and lice, grasses and their endophytes, and different fungal genes from the same genome. A companion toolkit, Phylotree, is provided to facilitate computational experiments.
我们提出了一种统计方法,用于检验具有给定比对的两棵系统发育树是否存在显著不一致。我们的方法比较由两个输入比对给出的系统发育树的两种分布,而不是比较树的点估计。这种统计方法可应用于基因树分析,例如,检测基因组进化中的异常事件,如水平基因转移和重排。我们的方法在将树映射到向量空间后,使用均值差异来比较树的两种分布。然后可以应用自展比对列来获得p值。为了计算均值之间的距离,我们采用一种“核方法”,当树映射到高维特征空间(例如,分裂或四重奏特征空间)时,这种方法可以加快距离计算。在这项初步研究中,首先我们在合并模型下模拟的数据集上测试我们的统计方法,以检验两个比对是否由一致的基因树生成。我们根据模拟结果,将其应用于地鼠和虱子、草及其内生菌以及来自同一基因组的不同真菌基因的数据集。我们提供了一个配套工具包Phylotree,以方便进行计算实验。