Liou Yi-Fan, Huang Hui-Ling, Ho Shinn-Ying
Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan.
Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan.
BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):503. doi: 10.1186/s12859-016-1368-z.
Most of hydrophilic and hydrophobic residues are thought to be exposed and buried in proteins, respectively. In contrast to the majority of the existing studies on protein folding characteristics using protein structures, in this study, our aim was to design predictors for estimating relative solvent accessibility (RSA) of amino acid residues to discover protein folding characteristics from sequences.
The proposed 20 real-value RSA predictors were designed on the basis of the support vector regression method with a set of informative physicochemical properties (PCPs) obtained by means of an optimal feature selection algorithm. Then, molecular dynamics simulations were performed for validating the knowledge discovered by analysis of the selected PCPs.
The RSA predictors had the mean absolute error of 14.11% and a correlation coefficient of 0.69, better than the existing predictors. The hydrophilic-residue predictors preferred PCPs of buried amino acid residues to PCPs of exposed ones as prediction features. A hydrophobic spine composed of exposed hydrophobic residues of an α-helix was discovered by analyzing the PCPs of RSA predictors corresponding to hydrophobic residues. For example, the results of a molecular dynamics simulation of wild-type sequences and their mutants showed that proteins 1MOF and 2WRP_H16I (Protein Data Bank IDs), which have a perfectly hydrophobic spine, have more stable structures than 1MOF_I54D and 2WRP do (which do not have a perfectly hydrophobic spine).
We identified informative PCPs to design high-performance RSA predictors and to analyze these PCPs for identification of novel protein folding characteristics. A hydrophobic spine in a protein can help to stabilize exposed α-helices.
大多数亲水性和疏水性残基分别被认为是暴露在蛋白质表面和埋藏在蛋白质内部的。与大多数利用蛋白质结构研究蛋白质折叠特征的现有研究不同,在本研究中,我们的目标是设计预测器来估计氨基酸残基的相对溶剂可及性(RSA),以便从序列中发现蛋白质折叠特征。
基于支持向量回归方法,利用通过最优特征选择算法获得的一组信息丰富的物理化学性质(PCP),设计了所提出的20个实值RSA预测器。然后,进行分子动力学模拟以验证通过对所选PCP的分析发现的知识。
RSA预测器的平均绝对误差为14.11%,相关系数为0.69,优于现有预测器。亲水性残基预测器更倾向于将埋藏氨基酸残基的PCP作为预测特征,而不是暴露氨基酸残基的PCP。通过分析与疏水性残基对应的RSA预测器的PCP,发现了由α螺旋的暴露疏水性残基组成的疏水脊柱。例如,野生型序列及其突变体的分子动力学模拟结果表明,具有完美疏水脊柱的蛋白质1MOF和2WRP_H16I(蛋白质数据库ID)比1MOF_I54D和2WRP(不具有完美疏水脊柱)具有更稳定的结构。
我们确定了信息丰富的PCP,以设计高性能的RSA预测器,并分析这些PCP以识别新的蛋白质折叠特征。蛋白质中的疏水脊柱有助于稳定暴露的α螺旋。