Gassend Blaise, O'Donnell Charles W, Thies William, Lee Andrew, van Dijk Marten, Devadas Srinivas
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
BMC Bioinformatics. 2007 May 24;8 Suppl 5(Suppl 5):S3. doi: 10.1186/1471-2105-8-S5-S3.
Our goal is to develop a state-of-the-art protein secondary structure predictor, with an intuitive and biophysically-motivated energy model. We treat structure prediction as an optimization problem, using parameterizable cost functions representing biological "pseudo-energies". Machine learning methods are applied to estimate the values of the parameters to correctly predict known protein structures.
Focusing on the prediction of alpha helices in proteins, we show that a model with 302 parameters can achieve a Qalpha value of 77.6% and an SOValpha value of 73.4%. Such performance numbers are among the best for techniques that do not rely on external databases (such as multiple sequence alignments). Further, it is easier to extract biological significance from a model with so few parameters.
The method presented shows promise for the prediction of protein secondary structure. Biophysically-motivated elementary free-energies can be learned using SVM techniques to construct an energy cost function whose predictive performance rivals state-of-the-art. This method is general and can be extended beyond the all-alpha case described here.
我们的目标是开发一种具有直观且基于生物物理的能量模型的先进蛋白质二级结构预测器。我们将结构预测视为一个优化问题,使用代表生物“伪能量”的可参数化成本函数。应用机器学习方法来估计参数值,以正确预测已知的蛋白质结构。
专注于蛋白质中α螺旋的预测,我们表明一个具有302个参数的模型可以实现77.6%的Qα值和73.4%的SOValpha值。对于不依赖外部数据库(如多序列比对)的技术而言,这样的性能指标处于最佳水平之列。此外,从参数如此少的模型中更容易提取生物学意义。
所提出的方法在蛋白质二级结构预测方面显示出前景。可以使用支持向量机技术学习基于生物物理的基本自由能,以构建一个预测性能可与现有技术相媲美的能量成本函数。该方法具有通用性,并且可以扩展到此处描述的全α情况之外。