Liu Lijun, He Deyong, Yang Shaoming, Xu Yaping
School of Chemistry and Chemical Engineering, Jinggangshan University, Ji'an 343009, China.
Protein Pept Lett. 2010 Feb;17(2):246-53. doi: 10.2174/092986610790226085.
In this study, we used two categories of molecular descriptors as CODESSA and DPPS (divided physicochemical property scores of amino acids) to parameterize structural characteristics of 2015 human amphiphysin SH3 domain-binding decapeptides at atom and residue levels. Based upon that, several robust quantitative structure-affinity relationship (QSAR) models were then constructed using partial least squares regression (PLS) and least squares-support vector machine (LSSVM) coupled with genetic algorithm (GA)-variable selection. Results show that (1) GA is a powerful tool for variable selection by which the most informative variable combinations can be efficiently determined for PLS and LSSVM modeling, (2) regression models constructed using nonlinear LSSVM approach are more robust and predictable than those by linear PLS method, (3) the residue level descriptor (DPPS) performs better in capturing peptide structural characteristics, more amenable than those from the atom level descriptor (CODESSA). By investigating the optimal DPPS-based GA-LSSVM model, it is indicated that the core motif of SH3 domain-binding peptides contributes significantly to the binding affinity, whereas the two end residues, especially the N-terminal residue, have a little effect on the binding process.
在本研究中,我们使用了两类分子描述符,即CODESSA和DPPS(氨基酸的划分物理化学性质得分),在原子和残基水平上对2015种人发动蛋白SH3结构域结合十肽的结构特征进行参数化。在此基础上,使用偏最小二乘回归(PLS)和最小二乘支持向量机(LSSVM)结合遗传算法(GA)-变量选择构建了几个稳健的定量构效关系(QSAR)模型。结果表明:(1)GA是一种强大的变量选择工具,通过它可以有效地为PLS和LSSVM建模确定最具信息性的变量组合;(2)使用非线性LSSVM方法构建的回归模型比线性PLS方法构建的模型更稳健且更具预测性;(3)残基水平描述符(DPPS)在捕获肽结构特征方面表现更好,比原子水平描述符(CODESSA)更适用。通过研究基于DPPS的最优GA-LSSVM模型表明,SH3结构域结合肽的核心基序对结合亲和力有显著贡献,而两个末端残基,尤其是N末端残基,对结合过程影响较小。