Parallel Computing and Complex Systems Group, Department of Computer Science, University of Leipzig, Augustusplatz 10, D-04109 Leipzig, Germany.
Mol Phylogenet Evol. 2013 Nov;69(2):352-64. doi: 10.1016/j.ympev.2013.05.002. Epub 2013 May 16.
About 2800 mitochondrial genomes of Metazoa are present in NCBI RefSeq today, two thirds belonging to vertebrates. Metazoan phylogeny was recently challenged by large scale EST approaches (phylogenomics), stabilizing classical nodes while simultaneously supporting new sister group hypotheses. The use of mitochondrial data in deep phylogeny analyses was often criticized because of high substitution rates on nucleotides, large differences in amino acid substitution rate between taxa, and biases in nucleotide frequencies. Nevertheless, mitochondrial genome data might still be promising as it allows for a larger taxon sampling, while presenting a smaller amount of sequence information. We present the most comprehensive analysis of bilaterian relationships based on mitochondrial genome data. The analyzed data set comprises more than 650 mitochondrial genomes that have been chosen to represent a profound sample of the phylogenetic as well as sequence diversity. The results are based on high quality amino acid alignments obtained from a complete reannotation of the mitogenomic sequences from NCBI RefSeq database. However, the results failed to give support for many otherwise undisputed high-ranking taxa, like Mollusca, Hexapoda, Arthropoda, and suffer from extreme long branches of Nematoda, Platyhelminthes, and some other taxa. In order to identify the sources of misleading phylogenetic signals, we discuss several problems associated with mitochondrial genome data sets, e.g. the nucleotide and amino acid landscapes and a strong correlation of gene rearrangements with long branches.
目前,NCBI RefSeq 中约有 2800 个后生动物的线粒体基因组,其中三分之二属于脊椎动物。后生动物系统发育最近受到大规模 EST 方法(系统基因组学)的挑战,这些方法稳定了经典节点,同时支持了新的姐妹群假说。由于核苷酸的高替换率、分类群之间氨基酸替换率的巨大差异以及核苷酸频率的偏差,线粒体数据在深度系统发育分析中的应用经常受到批评。然而,线粒体基因组数据仍然具有很大的潜力,因为它允许更大的分类群采样,同时提供较少的序列信息。我们根据线粒体基因组数据对后生动物的关系进行了最全面的分析。分析的数据集中包含了 650 多个线粒体基因组,这些基因组是从 NCBI RefSeq 数据库中选择的,代表了系统发育和序列多样性的深刻样本。这些结果是基于从 NCBI RefSeq 数据库中完整重新注释的线粒体基因组序列获得的高质量氨基酸比对。然而,这些结果未能为许多其他无可争议的高级分类群提供支持,如软体动物门、六足动物门、节肢动物门,并且线虫门、扁形动物门和其他一些分类群的极端长分支也受到影响。为了确定误导系统发育信号的来源,我们讨论了与线粒体基因组数据集相关的几个问题,例如核苷酸和氨基酸景观以及基因重排与长分支的强烈相关性。