State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Department of Plant Protection, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China.
Institute of Insect Sciences, Department of Plant Protection, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China.
Syst Biol. 2021 Aug 11;70(5):997-1014. doi: 10.1093/sysbio/syab011.
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict the between likelihood-based signal (quantified by the difference in gene-wise log-likelihood score or $\Delta $GLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or $\Delta $GQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30-36% of genes in each data matrix are inconsistent, that is, each of these genes has a higher log-likelihood score for T1 versus T2 (i.e., $\Delta $GLS $>$0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., $\Delta $GQS $<$0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that the removal of inconsistent genes from data sets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from data sets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.[Conflict; gene tree; phylogenetic signal; phylogenetics; phylogenomics; Tree of Life.].
拓扑冲突或不和谐在系统基因组学数据中普遍存在。基于串联和合并的方法通常会产生不和谐的拓扑结构,但这种冲突的原因很难确定。我们检查了三个动物、真菌和植物的系统基因组学研究中,基于似然的信号(由基因之间对数似然得分的差异量化,即$\Delta $GLS)和基于四分体的拓扑信号(由基因之间四分体得分的差异量化,即$\Delta $GQS)之间冲突引起的不和谐,这三个研究选择的原因是,它们基于串联的 IQ-TREE(T1)和基于四分体的 ASTRAL(T2)系统发育树产生了八个冲突的内部分支(二分法)。通过比较这三个数据矩阵中所有基因的系统发育信号类型,我们发现每个数据矩阵中有 30-36%的基因是不一致的,也就是说,这些基因中的每一个在 T1 相对于 T2 的对数似然得分更高(即$\Delta $GLS $>$0),而其 T1 拓扑结构的四分体得分低于 T2 拓扑结构(即$\Delta $GQS $<$0)或反之亦然。使用多种度量标准(例如,进化率、基因树拓扑结构、分支长度分布、隐藏的并系关系和基因树分歧)比较不一致和一致基因表明,不一致基因更不可能恢复 T1 或 T2,并且基因树分歧程度高于一致基因。模拟分析表明,从不完全谱系分选(ILS)水平低、基因树估计误差(GTEE)水平低和中等的数据集中去除不一致基因可以减少不和谐并提高准确性。相比之下,从中等和高 ILS 水平以及高 GTEE 水平的数据集中去除不一致基因消除或广泛减少了不和谐,但得到的一致物种系统发育树在拓扑结构上并不总是与真实的物种树完全相同。[冲突;基因树;系统发育信号;系统发育学;系统基因组学;生命之树。]