Vullo Alessandro, Frasconi Paolo
Department of Systems and Computer Science, Università di Firenze Via di S. Marta 3, 50139-I Firenze, Italy.
Bioinformatics. 2004 Mar 22;20(5):653-9. doi: 10.1093/bioinformatics/btg463. Epub 2004 Jan 22.
We focus on the prediction of disulfide bridges in proteins starting from their amino acid sequence and from the knowledge of the disulfide bonding state of each cysteine. The location of disulfide bridges is a structural feature that conveys important information about the protein main chain conformation and can therefore help towards the solution of the folding problem. Existing approaches based on weighted graph matching algorithms do not take advantage of evolutionary information. Recursive neural networks (RNN), on the other hand, can handle in a natural way complex data structures such as graphs whose vertices are labeled by real vectors, allowing us to incorporate multiple alignment profiles in the graphical representation of disulfide connectivity patterns.
The core of the method is the use of machine learning tools to rank alternative disulfide connectivity patterns. We develop an ad-hoc RNN architecture for scoring labeled undirected graphs that represent connectivity patterns. In order to compare our algorithm with previous methods, we report experimental results on the SWISS-PROT 39 dataset. We find that using multiple alignment profiles allows us to obtain significant prediction accuracy improvements, clearly demonstrating the important role played by evolutionary information.
The Web interface of the predictor is available at http://neural.dsi.unifi.it/cysteines
我们专注于从蛋白质的氨基酸序列以及每个半胱氨酸的二硫键结合状态出发,预测蛋白质中的二硫键。二硫键的位置是一种结构特征,它传达了有关蛋白质主链构象的重要信息,因此有助于解决折叠问题。现有的基于加权图匹配算法的方法没有利用进化信息。另一方面,递归神经网络(RNN)能够以自然的方式处理复杂的数据结构,如图形,其顶点由实向量标记,这使我们能够在二硫键连接模式的图形表示中纳入多序列比对概况。
该方法的核心是使用机器学习工具对二硫键连接模式的替代方案进行排序。我们开发了一种专门的RNN架构,用于对表示连接模式的带标签无向图进行评分。为了将我们的算法与先前的方法进行比较,我们在SWISS-PROT 39数据集上报告了实验结果。我们发现使用多序列比对概况能够显著提高预测准确率,清楚地证明了进化信息所起的重要作用。
预测器的网络界面可在http://neural.dsi.unifi.it/cysteines获取