Department of Integrative Biology, University of Texas, Austin, TX 78712, United States.
Plant Biology Graduate Program, University of Texas at Austin, 1 University Station (A6700), Austin, TX 78712, United States.
Mol Phylogenet Evol. 2015 Aug;89:28-36. doi: 10.1016/j.ympev.2015.03.012. Epub 2015 Apr 4.
Previous analyses of single diatom chloroplast protein-encoded genes recovered results highly incongruent with both traditional phylogenies and phylogenies derived from the nuclear encoded small subunit (SSU) gene. Our analysis here of six individual chloroplast genes (atpB, psaA, psaB, psbA, psbC and rbcL) obtained similar anomalous results. However, phylogenetic noise in these genes did not appear to be correlated, and their concatenation appeared to effectively sum their collective signal. We empirically demonstrated the value of combining phylogenetic information profiling, partitioned Bremer support and entropy analysis in examining the utility of various partitions in phylogenetic analysis. Noise was low in the 1st and 2nd codon positions, but so was signal. Conversely, high noise levels in the 3rd codon position was accompanied by high signal. Perhaps counterintuitively, simple exclusion experiments demonstrated this was especially true at deeper nodes where the 3rd codon position contributed most to a result congruent with morphology and SSU (and the total evidence tree here). Correlated with our empirical findings, probability of correct signal (derived from information profiling) increased and the statistical significance of substitutional saturation decreased as data were aggregated. In this regard, the aggregated 3rd codon position performed as well or better than more slowly evolving sites. Simply put, direct methods of noise removal (elimination of fast-evolving sites) disproportionately removed signal. Information profiling and partitioned Bremer support suggest that addition of chloroplast data will rapidly improve our understanding of the diatom phylogeny, but conversely also illustrate that some parts of the diatom tree are likely to remain recalcitrant to addition of molecular data. The methods based on information profiling have been criticized for their numerous assumptions and parameter estimates and the fact that they are based on quartets of taxa. Our empirical results support theoretical arguments that the simplifying assumptions made in these methods are robust to "real-life" situations.
先前对单个硅藻叶绿体蛋白编码基因的分析结果与传统系统发育和核编码小亚基(SSU)基因衍生的系统发育高度不一致。我们在这里对六个单独的叶绿体基因(atpB、psaA、psaB、psbA、psbC 和 rbcL)的分析也得到了类似的异常结果。然而,这些基因中的系统发育噪声似乎没有相关性,并且它们的串联似乎有效地总结了它们的集体信号。我们通过实证证明了结合系统发育信息分析、分区 Bremer 支持和熵分析来检查各种分区在系统发育分析中的效用的价值。第一和第二位密码子位置的噪声较低,但信号也较低。相反,第三位密码子位置的噪声水平较高,同时信号也较高。也许违反直觉的是,简单的排除实验表明,在更深的节点处尤其如此,在这些节点处,第三位密码子位置对与形态学和 SSU 一致的结果(以及这里的总证据树)贡献最大。与我们的经验发现相关,正确信号的概率(源自信息分析)随着数据的聚合而增加,替代饱和的统计显著性降低。在这方面,聚合的第三位密码子位置的表现与进化较慢的位点一样好或更好。简单地说,去除噪声的直接方法(消除快速进化的位点)不成比例地去除了信号。信息分析和分区 Bremer 支持表明,添加叶绿体数据将迅速提高我们对硅藻系统发育的理解,但反过来也说明了硅藻树的某些部分可能仍然难以添加分子数据。基于信息分析的方法因其众多假设和参数估计以及它们基于四联体分类单元而受到批评。我们的经验结果支持理论论点,即这些方法中简化的假设在“现实生活”情况下是稳健的。