Bodén Mikael, Yuan Zheng, Bailey Timothy L
School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, St Lucia, Australia.
BMC Bioinformatics. 2006 Feb 14;7:68. doi: 10.1186/1471-2105-7-68.
The structure of proteins may change as a result of the inherent flexibility of some protein regions. We develop and explore probabilistic machine learning methods for predicting a continuum secondary structure, i.e. assigning probabilities to the conformational states of a residue. We train our methods using data derived from high-quality NMR models.
Several probabilistic models not only successfully estimate the continuum secondary structure, but also provide a categorical output on par with models directly trained on categorical data. Importantly, models trained on the continuum secondary structure are also better than their categorical counterparts at identifying the conformational state for structurally ambivalent residues.
Cascaded probabilistic neural networks trained on the continuum secondary structure exhibit better accuracy in structurally ambivalent regions of proteins, while sustaining an overall classification accuracy on par with standard, categorical prediction methods.
由于某些蛋白质区域固有的灵活性,蛋白质的结构可能会发生变化。我们开发并探索了概率机器学习方法来预测连续二级结构,即给残基的构象状态分配概率。我们使用从高质量核磁共振模型获得的数据来训练我们的方法。
几种概率模型不仅成功地估计了连续二级结构,还提供了与直接在分类数据上训练的模型相当的分类输出。重要的是,在连续二级结构上训练的模型在识别结构模糊残基的构象状态方面也比其分类对应模型更好。
在连续二级结构上训练的级联概率神经网络在蛋白质结构模糊区域表现出更高的准确性,同时保持与标准分类预测方法相当的整体分类准确性。