von Reumont Björn M, Meusemann Karen, Szucsich Nikolaus U, Dell'Ampio Emiliano, Gowri-Shankar Vivek, Bartel Daniela, Simon Sabrina, Letsch Harald O, Stocsits Roman R, Luan Yun-xia, Wägele Johann Wolfgang, Pass Günther, Hadrys Heike, Misof Bernhard
Molecular Lab, Zoologisches Forschungsmuseum A, Koenig, Bonn, Germany.
BMC Evol Biol. 2009 May 27;9:119. doi: 10.1186/1471-2148-9-119.
Whenever different data sets arrive at conflicting phylogenetic hypotheses, only testable causal explanations of sources of errors in at least one of the data sets allow us to critically choose among the conflicting hypotheses of relationships. The large (28S) and small (18S) subunit rRNAs are among the most popular markers for studies of deep phylogenies. However, some nodes supported by this data are suspected of being artifacts caused by peculiarities of the evolution of these molecules. Arthropod phylogeny is an especially controversial subject dotted with conflicting hypotheses which are dependent on data set and method of reconstruction. We assume that phylogenetic analyses based on these genes can be improved further i) by enlarging the taxon sample and ii) employing more realistic models of sequence evolution incorporating non-stationary substitution processes and iii) considering covariation and pairing of sites in rRNA-genes.
We analyzed a large set of arthropod sequences, applied new tools for quality control of data prior to tree reconstruction, and increased the biological realism of substitution models. Although the split-decomposition network indicated a high noise content in the data set, our measures were able to both improve the analyses and give causal explanations for some incongruities mentioned from analyses of rRNA sequences. However, misleading effects did not completely disappear.
Analyses of data sets that result in ambiguous phylogenetic hypotheses demand for methods, which do not only filter stochastic noise, but likewise allow to differentiate phylogenetic signal from systematic biases. Such methods can only rely on our findings regarding the evolution of the analyzed data. Analyses on independent data sets then are crucial to test the plausibility of the results. Our approach can easily be extended to genomic data, as well, whereby layers of quality assessment are set up applicable to phylogenetic reconstructions in general.
每当不同的数据集得出相互冲突的系统发育假说时,只有对至少一个数据集中误差来源进行可检验的因果解释,才能让我们在相互冲突的关系假说中进行批判性选择。大(28S)和小(18S)亚基rRNA是研究深层系统发育最常用的标记之一。然而,这些数据支持的一些节点被怀疑是由这些分子进化的特殊性导致的假象。节肢动物系统发育是一个特别有争议的主题,充斥着相互冲突的假说,这些假说取决于数据集和重建方法。我们认为,基于这些基因的系统发育分析可以通过以下方式进一步改进:i)扩大分类单元样本;ii)采用更符合实际的序列进化模型,纳入非平稳替代过程;iii)考虑rRNA基因中位点的共变和配对。
我们分析了大量节肢动物序列,在构建树之前应用了新的数据质量控制工具,并提高了替代模型的生物学真实性。尽管分裂分解网络表明数据集中噪声含量很高,但我们的措施既能够改进分析,也能够对rRNA序列分析中提到的一些不一致性给出因果解释。然而,误导性影响并未完全消失。
对导致模糊系统发育假说的数据集进行分析,需要的方法不仅要过滤随机噪声,还要能够区分系统发育信号和系统偏差。这样的方法只能依赖于我们对所分析数据进化的研究结果。对独立数据集的分析对于检验结果的合理性至关重要。我们的方法也可以很容易地扩展到基因组数据,从而建立适用于一般系统发育重建的质量评估层次。