Pollastri Gianluca, Vullo Alessandro, Frasconi Paolo, Baldi Pierre
School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland.
J Comput Biol. 2006 Apr;13(3):631-50. doi: 10.1089/cmb.2006.13.631.
We develop and test machine learning methods for the prediction of coarse 3D protein structures, where a protein is represented by a set of rigid rods associated with its secondary structure elements (alpha-helices and beta-strands). First, we employ cascades of recursive neural networks derived from graphical models to predict the relative placements of segments. These are represented as discretized distance and angle maps, and the discretization levels are statistically inferred from a large and curated dataset. Coarse 3D folds of proteins are then assembled starting from topological information predicted in the first stage. Reconstruction is carried out by minimizing a cost function taking the form of a purely geometrical potential. We show that the proposed architecture outperforms simpler alternatives and can accurately predict binary and multiclass coarse maps. The reconstruction procedure proves to be fast and often leads to topologically correct coarse structures that could be exploited as a starting point for various protein modeling strategies. The fully integrated rod-shaped protein builder (predictor of contact maps + reconstruction algorithm) can be accessed at http://distill.ucd.ie/.
我们开发并测试了用于预测蛋白质粗略三维结构的机器学习方法,其中蛋白质由与其二级结构元件(α螺旋和β链)相关的一组刚性杆表示。首先,我们采用源自图形模型的递归神经网络级联来预测片段的相对位置。这些位置以离散化的距离和角度图表示,离散化级别是从一个大型且经过整理的数据集统计推断出来的。然后,从第一阶段预测的拓扑信息开始组装蛋白质的粗略三维折叠结构。通过最小化一个采用纯几何势形式的成本函数来进行重建。我们表明,所提出的架构优于更简单的替代方案,并且能够准确预测二元和多类粗略图谱。事实证明,重建过程速度很快,并且常常能得到拓扑正确的粗略结构,这些结构可作为各种蛋白质建模策略的起点。完整集成的杆状蛋白质构建器(接触图预测器 + 重建算法)可在http://distill.ucd.ie/上获取。