Olsen Rolf, Loomis William F
Department of Physics, University of California at San Diego, La Jolla, CA 92093, USA.
J Mol Evol. 2005 Nov;61(5):659-65. doi: 10.1007/s00239-005-0060-0. Epub 2005 Oct 20.
Sequence divergence among orthologous proteins was characterized with 34 amino acid replacement matrices, sequence context analysis, and a phylogenetic tree. The model was trained on very large datasets of aligned protein sequences drawn from 15 organisms including protists, plants, Dictyostelium, fungi, and animals. Comparative tests with models currently used in phylogeny, i.e., with JTT+gamma+/-F and WAG+gamma+/-F, made on a test dataset of 380 multiple alignments containing protein sequences from all five of the major taxonomic groups mentioned, indicate that our model should be preferred over the JTT+gamma+/-F and WAG+gamma+/-F models on datasets similar to the test dataset. The strong performance of our model of orthologous protein sequence divergence can be attributed to its ability to better approximate amino acid equilibrium frequencies to compositions found in alignment columns.
利用34种氨基酸替代矩阵、序列上下文分析和系统发育树对直系同源蛋白质之间的序列差异进行了表征。该模型是在从包括原生生物、植物、盘基网柄菌、真菌和动物在内的15种生物中提取的大量比对蛋白质序列数据集上进行训练的。在一个包含上述所有五个主要分类组蛋白质序列的380个多重比对测试数据集上,与系统发育中当前使用的模型(即JTT+gamma+/-F和WAG+gamma+/-F)进行的比较测试表明,在与测试数据集相似的数据集中,我们的模型应优于JTT+gamma+/-F和WAG+gamma+/-F模型。我们的直系同源蛋白质序列差异模型的强大性能可归因于其能够更好地将氨基酸平衡频率近似于比对列中发现的组成。