Suppr超能文献

溶剂可及性和基因表达在蛋白质序列进化建模中的作用。

Roles of solvent accessibility and gene expression in modeling protein sequence evolution.

作者信息

Wang Kuangyu, Yu Shuhui, Ji Xiang, Lakner Clemens, Griffing Alexander, Thorne Jeffrey L

机构信息

Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.

Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA. ; College of Life Science, Chongqing University, Chongqing, China.

出版信息

Evol Bioinform Online. 2015 Apr 29;11:85-96. doi: 10.4137/EBO.S22911. eCollection 2015.

Abstract

Models of protein evolution tend to ignore functional constraints, although structural constraints are sometimes incorporated. Here we propose a probabilistic framework for codon substitution that evaluates joint effects of relative solvent accessibility (RSA), a structural constraint; and gene expression, a functional constraint. First, we explore the relationship between RSA and codon usage at the genomic scale as well as at the individual gene scale. Motivated by these results, we construct our framework by determining how probable is an amino acid, given RSA and gene expression, and then evaluating the relative probability of observing a codon compared to other synonymous codons. We come to the biologically plausible conclusion that both RSA and gene expression are related to amino acid frequencies, but, among synonymous codons, the relative probability of a particular codon is more closely related to gene expression than RSA. To illustrate the potential applications of our framework, we propose a new codon substitution model. Using this model, we obtain estimates of 2N s, the product of effective population size N, and relative fitness difference of allele s. For a training data set consisting of human proteins with known structures and expression data, 2N s is estimated separately for synonymous and nonsynonymous substitutions in each protein. We then contrast the patterns of synonymous and nonsynonymous 2N s estimates across proteins while also taking gene expression levels of the proteins into account. We conclude that our 2N s estimates are too concentrated around 0, and we discuss potential explanations for this lack of variability.

摘要

蛋白质进化模型往往忽略功能限制,尽管有时会纳入结构限制。在此,我们提出了一个密码子替换的概率框架,该框架评估相对溶剂可及性(RSA)(一种结构限制)和基因表达(一种功能限制)的联合效应。首先,我们在基因组规模以及单个基因规模上探索RSA与密码子使用之间的关系。基于这些结果,我们通过确定在给定RSA和基因表达的情况下氨基酸出现的可能性,然后评估观察到一个密码子相对于其他同义密码子的相对概率,来构建我们的框架。我们得出了一个生物学上合理的结论,即RSA和基因表达都与氨基酸频率相关,但在同义密码子中,特定密码子的相对概率与基因表达的关系比与RSA的关系更为密切。为了说明我们框架的潜在应用,我们提出了一个新的密码子替换模型。使用这个模型,我们获得了有效种群大小N与等位基因s的相对适应度差异的乘积2Ns的估计值。对于一个由具有已知结构和表达数据的人类蛋白质组成的训练数据集,分别针对每种蛋白质中的同义替换和非同义替换估计2Ns。然后,我们对比不同蛋白质之间同义替换和非同义替换的2Ns估计模式,同时也考虑了蛋白质的基因表达水平。我们得出结论,我们的2Ns估计值过于集中在0附近,并讨论了这种缺乏变异性的潜在解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/4415675/5e8118318b1c/ebo-11-2015-085f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验