Baldi P, Brunak S, Frasconi P, Soda G, Pollastri G
Department of Information and Computer Science, College of Medicine, University of California, Irvine 92697-3425, USA.
Bioinformatics. 1999 Nov;15(11):937-46. doi: 10.1093/bioinformatics/15.11.937.
Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit capturing variable long-rang information.
We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct prediction--at least comparable to the best existing systems--the main emphasis here is on the development of new algorithmic ideas.
The executable program for predicting protein secondary structure is available from the authors free of charge.
pfbaldi@ics.uci.edu, gpollast@ics.uci.edu, brunak@cbs.dtu.dk, paolo@dsi.unifi.it.
预测蛋白质的二级结构(α螺旋、β折叠、卷曲)是阐明其三维结构及其功能的重要一步。目前,最佳的预测方法基于机器学习方法,特别是具有固定且相对较短的氨基酸输入窗口(以预测位点为中心)的神经网络架构。虽然固定的小窗口可避免过拟合问题,但它无法捕捉可变的长程信息。
我们引入了一类新颖的架构,这类架构能够基于可变范围的依赖性来学习进行预测。这些架构扩展了循环神经网络,引入非因果双向动态机制以捕捉上游和下游信息。预测算法通过使用估计器混合来完成,这些估计器利用在输入和输出层面以多重比对表示的进化信息。虽然我们的系统目前实现了接近76%的正确预测总体性能——至少与现有的最佳系统相当——但这里的主要重点是新算法思想的开发。
预测蛋白质二级结构的可执行程序可从作者处免费获取。
pfbaldi@ics.uci.edu,gpollast@ics.uci.edu,brunak@cbs.dtu.dk,paolo@dsi.unifi.it。