School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou, Zhejiang 310018, P.R. China and Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada.
Bioinformatics. 2014 Feb 15;30(4):497-505. doi: 10.1093/bioinformatics/btt716. Epub 2013 Dec 12.
Gaussian network model (GNM) is widely adopted to analyze and understand protein dynamics, function and conformational changes. The existing GNM-based approaches require atomic coordinates of the corresponding protein and cannot be used when only the sequence is known.
We report, first of its kind, GNM model that allows modeling using the sequence. Our linear regression-based, parameter-free, sequence-derived GNM (L-pfSeqGNM) uses contact maps predicted from the sequence and models local, in the sequence, contact neighborhoods with the linear regression. Empirical benchmarking shows relatively high correlations between the native and the predicted with L-pfSeqGNM B-factors and between the cross-correlations of residue fluctuations derived from the structure- and the sequence-based GNM models. Our results demonstrate that L-pfSeqGNM is an attractive platform to explore protein dynamics. In contrast to the highly used GNMs that require protein structures that number in thousands, our model can be used to study motions for the millions of the readily available sequences, which finds applications in modeling conformational changes, protein-protein interactions and protein functions.
高斯网络模型(GNM)被广泛用于分析和理解蛋白质动力学、功能和构象变化。现有的基于 GNM 的方法需要相应蛋白质的原子坐标,而当只知道序列时则无法使用。
我们首次报告了一种允许使用序列进行建模的 GNM 模型。我们的基于线性回归的、无参数的、序列衍生的 GNM(L-pfSeqGNM)使用序列预测的接触图,并使用线性回归对序列中的局部接触邻域进行建模。经验基准测试表明,L-pfSeqGNM 预测的天然 B 因子与结构和序列衍生的 GNM 模型衍生的残基波动的互相关之间具有相对较高的相关性。我们的结果表明,L-pfSeqGNM 是探索蛋白质动力学的一个有吸引力的平台。与需要数千个蛋白质结构的高度使用的 GNM 相比,我们的模型可以用于研究数以百万计的现成序列的运动,这在建模构象变化、蛋白质-蛋白质相互作用和蛋白质功能方面有应用。