Mathematical and Computer Sciences & Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia.
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S34. doi: 10.1186/1471-2105-12-S1-S34.
Hydrogen bonds (H-bonds) play a key role in both the formation and stabilization of protein structures. They form and break while a protein deforms, for instance during the transition from a non-functional to a functional state. The intrinsic strength of an individual H-bond has been studied from an energetic viewpoint, but energy alone may not be a very good predictor.
This paper describes inductive learning methods to train protein-independent probabilistic models of H-bond stability from molecular dynamics (MD) simulation trajectories of various proteins. The training data contains 32 input attributes (predictors) that describe an H-bond and its local environment in a conformation c and the output attribute is the probability that the H-bond will be present in an arbitrary conformation of this protein achievable from c within a time duration Δ. We model dependence of the output variable on the predictors by a regression tree.
Several models are built using 6 MD simulation trajectories containing over 4000 distinct H-bonds (millions of occurrences). Experimental results demonstrate that such models can predict H-bond stability quite well. They perform roughly 20% better than models based on H-bond energy alone. In addition, they can accurately identify a large fraction of the least stable H-bonds in a conformation. In most tests, about 80% of the 10% H-bonds predicted as the least stable are actually among the 10% truly least stable. The important attributes identified during the tree construction are consistent with previous findings.
We use inductive learning methods to build protein-independent probabilistic models to study H-bond stability, and demonstrate that the models perform better than H-bond energy alone.
氢键(H 键)在蛋白质结构的形成和稳定中起着关键作用。它们在蛋白质变形时形成和断裂,例如在从非功能状态到功能状态的转变过程中。单个 H 键的固有强度已经从能量角度进行了研究,但能量本身可能不是一个很好的预测指标。
本文描述了从各种蛋白质的分子动力学(MD)模拟轨迹中训练蛋白质独立的 H 键稳定性概率模型的归纳学习方法。训练数据包含 32 个输入属性(预测器),用于描述构象 c 中的 H 键及其局部环境,输出属性是 H 键在该蛋白质的任意构象中存在的概率,这些构象可以从 c 在一定的时间间隔 Δ 内达到。我们通过回归树来模拟输出变量对预测器的依赖性。
使用 6 个 MD 模拟轨迹构建了几个模型,其中包含超过 4000 个不同的 H 键(数百万个实例)。实验结果表明,这些模型可以很好地预测 H 键的稳定性。它们的性能比仅基于 H 键能量的模型要好 20%左右。此外,它们还可以准确识别构象中大部分最不稳定的 H 键。在大多数测试中,预测为最不稳定的 10%H 键中约有 80%实际上是在最不稳定的 10%H 键中。在树的构建过程中确定的重要属性与先前的发现一致。
我们使用归纳学习方法构建蛋白质独立的概率模型来研究 H 键稳定性,并证明模型的性能优于仅基于 H 键能量的模型。