Zhang Hua, Kurgan Lukasz
School of Computer and Information Engineering, Zhejiang Gongshang University, 310018, Hangzhou, Zhejiang, People's Republic of China,
Amino Acids. 2014 Dec;46(12):2665-80. doi: 10.1007/s00726-014-1817-9. Epub 2014 Aug 9.
Knowledge of protein flexibility is vital for deciphering the corresponding functional mechanisms. This knowledge would help, for instance, in improving computational drug design and refinement in homology-based modeling. We propose a new predictor of the residue flexibility, which is expressed by B-factors, from protein chains that use local (in the chain) predicted (or native) relative solvent accessibility (RSA) and custom-derived amino acid (AA) alphabets. Our predictor is implemented as a two-stage linear regression model that uses RSA-based space in a local sequence window in the first stage and a reduced AA pair-based space in the second stage as the inputs. This method is easy to comprehend explicit linear form in both stages. Particle swarm optimization was used to find an optimal reduced AA alphabet to simplify the input space and improve the prediction performance. The average correlation coefficients between the native and predicted B-factors measured on a large benchmark dataset are improved from 0.65 to 0.67 when using the native RSA values and from 0.55 to 0.57 when using the predicted RSA values. Blind tests that were performed on two independent datasets show consistent improvements in the average correlation coefficients by a modest value of 0.02 for both native and predicted RSA-based predictions.
了解蛋白质的灵活性对于解读相应的功能机制至关重要。例如,这些知识将有助于改进基于同源性建模的计算药物设计和优化。我们提出了一种新的预测残基灵活性的方法,该方法通过B因子来表示,适用于使用局部(在链中)预测(或天然)相对溶剂可及性(RSA)和自定义衍生氨基酸(AA)字母表的蛋白质链。我们的预测器实现为一个两阶段线性回归模型,在第一阶段使用局部序列窗口中基于RSA的数据空间,在第二阶段使用基于简化AA对的数据空间作为输入。该方法在两个阶段都易于理解显式线性形式。使用粒子群优化算法来寻找最优的简化AA字母表,以简化输入空间并提高预测性能。在一个大型基准数据集上测量的天然B因子与预测B因子之间的平均相关系数,使用天然RSA值时从0.65提高到0.67,使用预测RSA值时从0.55提高到0.57。在两个独立数据集上进行的盲测表明,基于天然和预测RSA的预测,平均相关系数均有一致的适度提高,提高值为0.02。