Le Vinh Sy, Dang Cuong Cao, Le Quang Si
University of Engineering and Technology, Vietnam National University Hanoi, Hanoi, Vietnam.
School of Pharmacy and Biomedical Sciences, University of Portsmouth, Winston Churchill Avenue Portsmouth, Portsmouth, PO1 2UP, UK.
BMC Evol Biol. 2017 Jun 12;17(1):136. doi: 10.1186/s12862-017-0987-y.
Amino acid substitution models play an essential role in inferring phylogenies from mitochondrial protein data. However, only few empirical models have been estimated from restricted mitochondrial protein data of a hundred species. The existing models are unlikely to represent appropriately the amino acid substitutions from hundred thousands metazoan mitochondrial protein sequences.
We selected 125,935 mitochondrial protein sequences from 34,448 species in the metazoan kingdom to estimate new amino acid substitution models targeting metazoa, vertebrates and invertebrate groups. The new models help to find significantly better likelihood phylogenies in comparison with the existing models. We noted remarkable distances from phylogenies with the existing models to the maximum likelihood phylogenies that indicate a considerable number of incorrect bipartitions in phylogenies with the existing models. Finally, we used the new models and mitochondrial protein data to certify that Testudines, Aves, and Crocodylia form one separated clade within amniotes.
We introduced new mitochondrial amino acid substitution models for metazoan mitochondrial proteins. The new models outperform the existing models in inferring phylogenies from metazoan mitochondrial protein data. We strongly recommend researchers to use the new models in analysing metazoan mitochondrial protein data.
氨基酸替换模型在从线粒体蛋白质数据推断系统发育中起着至关重要的作用。然而,仅从一百个物种的有限线粒体蛋白质数据中估计出了少数经验模型。现有的模型不太可能恰当地代表来自数十万后生动物线粒体蛋白质序列的氨基酸替换。
我们从后生动物界的34448个物种中选择了125935个线粒体蛋白质序列,以估计针对后生动物、脊椎动物和无脊椎动物群体的新氨基酸替换模型。与现有模型相比,新模型有助于找到显著更好的似然系统发育。我们注意到,使用现有模型得到的系统发育与最大似然系统发育之间存在显著差异,这表明使用现有模型得到的系统发育中存在相当数量的错误二分法。最后,我们使用新模型和线粒体蛋白质数据来证明龟鳖目、鸟类和鳄目在羊膜动物中形成一个独立的分支。
我们为后生动物线粒体蛋白质引入了新的线粒体氨基酸替换模型。在从后生动物线粒体蛋白质数据推断系统发育方面,新模型优于现有模型。我们强烈建议研究人员在分析后生动物线粒体蛋白质数据时使用新模型。