Chi Peter B, Kim Dohyup, Lai Jason K, Bykova Nadia, Weber Claudia C, Kubelka Jan, Liberles David A
Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122.
Department of Mathematics and Computer Science, Ursinus College, Collegeville, Pennsylvania, 19426.
Proteins. 2018 Feb;86(2):218-228. doi: 10.1002/prot.25429. Epub 2017 Dec 12.
Improvements in the description of amino acid substitution are required to develop better pseudo-energy-based protein structure-aware models for use in phylogenetic studies. These models are used to characterize the probabilities of amino acid substitution and enable better simulation of protein sequences over a phylogeny. A better characterization of amino acid substitution probabilities in turn enables numerous downstream applications, like detecting positive selection, ancestral sequence reconstruction, and evolutionarily-motivated protein engineering. Many existing Markov models for amino acid substitution in molecular evolution disregard molecular structure and describe the amino acid substitution process over longer evolutionary periods poorly. Here, we present a new model upgraded with a site-specific parameterization of pseudo-energy terms in a coarse-grained force field, which describes local heterogeneity in physical constraints on amino acid substitution better than a previous pseudo-energy-based model with minimum cost in runtime. The importance of each weight term parameterization in characterizing underlying features of the site, including contact number, solvent accessibility, and secondary structural elements was evaluated, returning both expected and biologically reasonable relationships between model parameters. This results in the acceptance of proposed amino acid substitutions that more closely resemble those observed site-specific frequencies in gene family alignments. The modular site-specific pseudo-energy function is made available for download through the following website: https://liberles.cst.temple.edu/Software/CASS/index.html.
为了开发出更好的基于伪能量的蛋白质结构感知模型用于系统发育研究,需要改进氨基酸替换的描述。这些模型用于表征氨基酸替换的概率,并能够在系统发育过程中更好地模拟蛋白质序列。对氨基酸替换概率的更好表征进而能够实现众多下游应用,如检测正选择、重建祖先序列以及基于进化的蛋白质工程。分子进化中许多现有的氨基酸替换马尔可夫模型忽略了分子结构,并且对较长进化时期内的氨基酸替换过程描述不佳。在此,我们提出一种新模型,该模型在粗粒度力场中对伪能量项进行了位点特异性参数化升级,与之前基于伪能量且运行时成本最低的模型相比,它能更好地描述氨基酸替换物理约束中的局部异质性。评估了每个权重项参数化在表征位点潜在特征(包括接触数、溶剂可及性和二级结构元件)方面的重要性,得出了模型参数之间既符合预期又具有生物学合理性的关系。这使得所提出的氨基酸替换更符合在基因家族比对中观察到的位点特异性频率。模块化的位点特异性伪能量函数可通过以下网站下载:https://liberles.cst.temple.edu/Software/CASS/index.html。