Mandalaparthy Varun, Sanaboyana Venkata Ramana, Rafalia Hitesh, Gosavi Shachi
Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road, Bangalore, 560065, India.
Manipal University, Madhav Nagar, Manipal, 576104, India.
Proteins. 2018 Feb;86(2):248-262. doi: 10.1002/prot.25438. Epub 2017 Dec 19.
One of the main barriers to accurate computational protein structure prediction is searching the vast space of protein conformations. Distance restraints or inter-residue contacts have been used to reduce this search space, easing the discovery of the correct folded state. It has been suggested that about 1 contact for every 12 residues may be sufficient to predict structure at fold level accuracy. Here, we use coarse-grained structure-based models in conjunction with molecular dynamics simulations to examine this empirical prediction. We generate sparse contact maps for 15 proteins of varying sequence lengths and topologies and find that given perfect secondary-structural information, a small fraction of the native contact map (5%-10%) suffices to fold proteins to their correct native states. We also find that different sparse maps are not equivalent and we make several observations about the type of maps that are successful at such structure prediction. Long range contacts are found to encode more information than shorter range ones, especially for α and αβ-proteins. However, this distinction reduces for β-proteins. Choosing contacts that are a consensus from successful maps gives predictive sparse maps as does choosing contacts that are well spread out over the protein structure. Additionally, the folding of proteins can also be used to choose predictive sparse maps. Overall, we conclude that structure-based models can be used to understand the efficacy of structure-prediction restraints and could, in future, be tuned to include specific force-field interactions, secondary structure errors and noise in the sparse maps.
准确进行蛋白质结构预测的主要障碍之一是搜索蛋白质构象的广阔空间。距离约束或残基间接触已被用于减少这一搜索空间,从而更容易发现正确的折叠状态。有人提出,每12个残基约有1个接触可能足以在折叠水平精度上预测结构。在此,我们结合粗粒度基于结构的模型和分子动力学模拟来检验这一经验预测。我们为15种不同序列长度和拓扑结构的蛋白质生成了稀疏接触图,发现给定完美的二级结构信息,一小部分天然接触图(5%-10%)就足以使蛋白质折叠成其正确的天然状态。我们还发现不同的稀疏图并不等效,并且我们对在这种结构预测中成功的图的类型有一些观察结果。发现长程接触比短程接触编码更多信息,特别是对于α和αβ蛋白质。然而,对于β蛋白质,这种区别会减小。从成功的图中选择一致的接触点会得到预测性稀疏图,选择在蛋白质结构上分布良好的接触点也是如此。此外,蛋白质的折叠也可用于选择预测性稀疏图。总体而言,我们得出结论,基于结构的模型可用于理解结构预测约束的有效性,并且在未来可进行调整以纳入特定的力场相互作用、二级结构误差和稀疏图中的噪声。