Rizzato Francesca, Rodriguez Alex, Biarnés Xevi, Laio Alessandro
Scuola Internazionale Superiore di Studi Avanzati (SISSA), 34136 Trieste, Italy.
Laboratory of Biochemistry, Institut Químic de Sarrià (IQS), Universitat Ramon Llull (URL), 08017 Barcelona, Spain.
Genetics. 2017 Oct;207(2):643-652. doi: 10.1534/genetics.117.300078. Epub 2017 Jul 28.
Fast genome sequencing offers invaluable opportunities for building updated and improved models of protein sequence evolution. We here show that Single Nucleotide Polymorphisms (SNPs) can be used to build a model capable of predicting the probability of substitution between amino acids in variants of the same protein in different species. The model is based on a substitution matrix inferred from the frequency of codon interchanges observed in a suitably selected subset of human SNPs, and predicts the substitution probabilities observed in alignments between and related species at 85-100% of sequence identity better than any other approach we are aware of. The model gradually loses its predictive power at lower sequence identity. Our results suggest that SNPs can be employed, together with multiple sequence alignment data, to model protein sequence evolution. The SNP-based substitution matrix developed in this work can be exploited to better align protein sequences of related organisms, to refine the estimate of the evolutionary distance between protein variants from related species in phylogenetic trees and, in perspective, might become a useful tool for population analysis.
快速基因组测序为构建更新和改进的蛋白质序列进化模型提供了宝贵的机会。我们在此表明,单核苷酸多态性(SNP)可用于构建一个模型,该模型能够预测不同物种中同一蛋白质变体中氨基酸之间替换的概率。该模型基于从在适当选择的人类SNP子集中观察到的密码子互换频率推断出的替换矩阵,并且在序列同一性为85 - 100%时,比我们所知的任何其他方法都能更好地预测在目标物种与相关物种之间的比对中观察到的替换概率。在较低的序列同一性时,该模型的预测能力会逐渐丧失。我们的结果表明,SNP可与多序列比对数据一起用于构建蛋白质序列进化模型。在这项工作中开发的基于SNP的替换矩阵可用于更好地比对相关生物体的蛋白质序列,以完善系统发育树中来自相关物种的蛋白质变体之间进化距离的估计,并且从长远来看,可能会成为群体分析的一个有用工具。