School of Computer Engineering and Science, Shanghai University, Shanghai, People's Republic of China.
PLoS One. 2013 Apr 23;8(4):e61533. doi: 10.1371/journal.pone.0061533. Print 2013.
Despite continuing progress in X-ray crystallography and high-field NMR spectroscopy for determination of three-dimensional protein structures, the number of unsolved and newly discovered sequences grows much faster than that of determined structures. Protein modeling methods can possibly bridge this huge sequence-structure gap with the development of computational science. A grand challenging problem is to predict three-dimensional protein structure from its primary structure (residues sequence) alone. However, predicting residue contact maps is a crucial and promising intermediate step towards final three-dimensional structure prediction. Better predictions of local and non-local contacts between residues can transform protein sequence alignment to structure alignment, which can finally improve template based three-dimensional protein structure predictors greatly.
CNNcon, an improved multiple neural networks based contact map predictor using six sub-networks and one final cascade-network, was developed in this paper. Both the sub-networks and the final cascade-network were trained and tested with their corresponding data sets. While for testing, the target protein was first coded and then input to its corresponding sub-networks for prediction. After that, the intermediate results were input to the cascade-network to finish the final prediction.
The CNNcon can accurately predict 58.86% in average of contacts at a distance cutoff of 8 Å for proteins with lengths ranging from 51 to 450. The comparison results show that the present method performs better than the compared state-of-the-art predictors. Particularly, the prediction accuracy keeps steady with the increase of protein sequence length. It indicates that the CNNcon overcomes the thin density problem, with which other current predictors have trouble. This advantage makes the method valuable to the prediction of long length proteins. As a result, the effective prediction of long length proteins could be possible by the CNNcon.
尽管 X 射线晶体学和高场 NMR 光谱学在确定三维蛋白质结构方面不断取得进展,但未解决和新发现的序列数量的增长速度远远快于确定结构的数量。随着计算科学的发展,蛋白质建模方法有可能弥补这一巨大的序列-结构差距。一个巨大的挑战性问题是仅从其一级结构(残基序列)预测三维蛋白质结构。然而,预测残基接触图是朝着最终三维结构预测迈出的关键而有前途的中间步骤。更好地预测残基之间的局部和非局部接触可以将蛋白质序列比对转化为结构比对,从而最终大大改进基于模板的三维蛋白质结构预测器。
本文开发了一种使用六个子网络和一个最终级联网络的基于改进的多个神经网络的接触图预测器 CNNcon。子网络和最终级联网络都使用相应的数据集进行训练和测试。在测试时,首先对目标蛋白质进行编码,然后将其输入到相应的子网络中进行预测。之后,将中间结果输入级联网络以完成最终预测。
CNNcon 可以准确预测长度在 51 到 450 之间的蛋白质在距离截止值为 8 Å 时的接触点,平均准确率为 58.86%。比较结果表明,本方法的性能优于比较的最先进的预测器。特别是,随着蛋白质序列长度的增加,预测精度保持稳定。这表明 CNNcon 克服了其他当前预测器难以解决的密度稀疏问题。该优势使该方法对长链蛋白质的预测具有价值。因此,通过 CNNcon 可以实现对长链蛋白质的有效预测。