Department of Computer Science, Rice University, 6100 Main Street, Houston, TX 77005, USA.
Syst Biol. 2013 Sep;62(5):738-51. doi: 10.1093/sysbio/syt037. Epub 2013 Jun 4.
Hybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detect hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and taking their differences as indicators of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. To address this issue, we present a parsimony criterion for reconciling gene trees within the branches of a phylogenetic network, and a local search heuristic for inferring phylogenetic networks from collections of gene-tree topologies under this criterion. This framework enables phylogenetic analyses while accounting for both hybridization and ILS. Further, we propose two techniques for incorporating information about uncertainty in gene-tree estimates. Our simulation studies demonstrate the good performance of our framework in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, our framework shows good performance in terms of efficiency on handling large data sets in our experiments. Further, in analysing a yeast data set, we demonstrate issues that arise when analysing real data sets. Although a probabilistic approach was recently introduced for this problem, and although parsimonious reconciliations have accuracy issues under certain settings, our parsimony framework provides a much more computationally efficient technique for this type of analysis. Our framework now allows for genome-wide scans for hybridization, while also accounting for ILS.
杂交在许多生物群体中扮演着重要的进化角色。一种检测杂交的系统发育方法需要对一组感兴趣物种的基因组进行多个基因座的测序,重建它们的基因树,并将它们的差异作为杂交的指示。然而,遵循这种方法的方法大多忽略了种群效应,如不完全谱系分选(ILS)。由于杂交发生在亲缘关系密切的生物之间,ILS 很可能在起作用,因此必须在分析框架中考虑到它。为了解决这个问题,我们提出了一种在系统发育网络的分支内协调基因树的简约标准,以及一种基于此标准从基因树拓扑集合中推断系统发育网络的局部搜索启发式算法。该框架能够在考虑杂交和 ILS 的情况下进行系统发育分析。此外,我们提出了两种技术来整合基因树估计不确定性的信息。我们的模拟研究表明,我们的框架在识别杂交事件的位置以及估计经历杂交的基因比例方面表现良好。此外,我们的框架在处理大量数据集的效率方面表现良好。此外,在分析酵母数据集时,我们展示了在分析真实数据集时出现的问题。尽管最近为此问题引入了一种概率方法,并且简约协调在某些设置下存在准确性问题,但我们的简约框架为这种类型的分析提供了一种更高效的计算技术。我们的框架现在允许进行全基因组杂交扫描,同时还考虑了 ILS。