替代模型选择对蛋白质系统发育树重建的影响。

Influence of substitution model selection on protein phylogenetic tree reconstruction.

作者信息

Del Amparo Roberto, Arenas Miguel

机构信息

CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain.

CINBIO, Universidade de Vigo, 36310 Vigo, Spain; Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain; Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain.

出版信息

Gene. 2023 May 20;865:147336. doi: 10.1016/j.gene.2023.147336. Epub 2023 Mar 3.

DOI:10.1016/j.gene.2023.147336

PMID:36871672

Abstract

Probabilistic phylogenetic tree reconstruction is traditionally performed under a best-fitting substitution model of molecular evolution previously selected according to diverse statistical criteria. Interestingly, some recent studies proposed that this procedure is unnecessary for phylogenetic tree reconstruction leading to a debate in the field. In contrast to DNA sequences, phylogenetic tree reconstruction from protein sequences is traditionally based on empirical exchangeability matrices that can differ among taxonomic groups and protein families. Considering this aspect, here we investigated the influence of selecting a substitution model of protein evolution on phylogenetic tree reconstruction by the analyses of real and simulated data. We found that phylogenetic tree reconstructions based on a selected best-fitting substitution model of protein evolution are the most accurate, in terms of topology and branch lengths, compared with those derived from substitution models with amino acid replacement matrices far from the selected best-fitting model, especially when the data has large genetic diversity. Indeed, we found that substitution models with similar amino acid replacement matrices produce similar reconstructed phylogenetic trees, suggesting the use of substitution models as similar as possible to a selected best-fitting model when the latter cannot be used. Therefore, we recommend the use of the traditional protocol of selection among substitution models of evolution for protein phylogenetic tree reconstruction.

摘要

概率系统发育树重建传统上是在根据各种统计标准预先选择的最佳拟合分子进化替代模型下进行的。有趣的是，最近的一些研究表明，这种程序对于系统发育树重建是不必要的，这在该领域引发了一场争论。与DNA序列不同，从蛋白质序列重建系统发育树传统上基于经验可交换性矩阵，这些矩阵在分类群和蛋白质家族之间可能有所不同。考虑到这一方面，我们通过对真实数据和模拟数据的分析，研究了选择蛋白质进化替代模型对系统发育树重建的影响。我们发现，就拓扑结构和分支长度而言，基于选定的最佳拟合蛋白质进化替代模型的系统发育树重建比那些源自氨基酸替换矩阵远离选定最佳拟合模型的替代模型的重建更准确，特别是当数据具有较大的遗传多样性时。事实上，我们发现具有相似氨基酸替换矩阵的替代模型会产生相似的重建系统发育树，这表明当无法使用选定的最佳拟合模型时，应使用尽可能与选定最佳拟合模型相似的替代模型。因此，我们建议在蛋白质系统发育树重建中使用传统的进化替代模型选择方案。