Yeh So-Wei, Huang Tsun-Tsao, Liu Jen-Wei, Yu Sung-Huan, Shih Chien-Hua, Hwang Jenn-Kang, Echave Julian
Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 30050, Taiwan ; Center for Bioinformatics Research, National Chiao Tung University, Hsinchu 30050, Taiwan.
Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, Martín de Irigoyen 3100, San Martín, 1650 Buenos Aires, Argentina.
Biomed Res Int. 2014;2014:572409. doi: 10.1155/2014/572409. Epub 2014 Jul 9.
Functional and biophysical constraints result in site-dependent patterns of protein sequence variability. It is commonly assumed that the key structural determinant of site-specific rates of evolution is the Relative Solvent Accessibility (RSA). However, a recent study found that amino acid substitution rates correlate better with two Local Packing Density (LPD) measures, the Weighted Contact Number (WCN) and the Contact Number (CN), than with RSA. This work aims at a more thorough assessment. To this end, in addition to substitution rates, we considered four other sequence variability scores, four measures of solvent accessibility (SA), and other CN measures. We compared all properties for each protein of a structurally and functionally diverse representative dataset of monomeric enzymes. We show that the best sequence variability measures take into account phylogenetic tree topology. More importantly, we show that both LPD measures (WCN and CN) correlate better than all of the SA measures, regardless of the sequence variability score used. Moreover, the independent contribution of the best LPD measure is approximately four times larger than that of the best SA measure. This study strongly supports the conclusion that a site's packing density rather than its solvent accessibility is the main structural determinant of its rate of evolution.
功能和生物物理限制导致蛋白质序列变异性呈现位点依赖性模式。通常认为,位点特异性进化速率的关键结构决定因素是相对溶剂可及性(RSA)。然而,最近的一项研究发现,氨基酸替换率与两种局部堆积密度(LPD)指标,即加权接触数(WCN)和接触数(CN)的相关性,比与RSA的相关性更好。这项工作旨在进行更全面的评估。为此,除了替换率之外,我们还考虑了其他四个序列变异性评分、四个溶剂可及性(SA)指标以及其他CN指标。我们比较了一个结构和功能多样的单体酶代表性数据集中每个蛋白质的所有属性。我们表明,最佳的序列变异性指标考虑了系统发育树拓扑结构。更重要的是,我们表明,无论使用何种序列变异性评分,两种LPD指标(WCN和CN)的相关性都优于所有SA指标。此外,最佳LPD指标的独立贡献大约是最佳SA指标的四倍。这项研究有力地支持了这样一个结论,即一个位点的堆积密度而非其溶剂可及性是其进化速率的主要结构决定因素。