Kaur Harpreet, Raghava G P S
Bioinformatics Centre, Institute of Microbial Technology, Sector 39A, Chandigarh, India.
FEBS Lett. 2004 Apr 23;564(1-2):47-57. doi: 10.1016/S0014-5793(04)00305-9.
In this study, an attempt has been made to develop a neural network-based method for predicting segments in proteins containing aromatic-backbone NH (Ar-NH) interactions using multiple sequence alignment. We have analyzed 3121 segments seven residues long containing Ar-NH interactions, extracted from 2298 non-redundant protein structures where no two proteins have more than 25% sequence identity. Two consecutive feed-forward neural networks with a single hidden layer have been trained with standard back-propagation as learning algorithm. The performance of the method improves from 0.12 to 0.15 in terms of Matthews correlation coefficient (MCC) value when evolutionary information (multiple alignment obtained from PSI-BLAST) is used as input instead of a single sequence. The performance of the method further improves from MCC 0.15 to 0.20 when secondary structure information predicted by PSIPRED is incorporated in the prediction. The final network yields an overall prediction accuracy of 70.1% and an MCC of 0.20 when tested by five-fold cross-validation. Overall the performance is 15.2% higher than the random prediction. The method consists of two neural networks: (i) a sequence-to-structure network which predicts the aromatic residues involved in Ar-NH interaction from multiple alignment of protein sequences and (ii) a structure-to structure network where the input consists of the output obtained from the first network and predicted secondary structure. Further, the actual position of the donor residue within the 'potential' predicted fragment has been predicted using a separate sequence-to-structure neural network. Based on the present study, a server Ar_NHPred has been developed which predicts Ar-NH interaction in a given amino acid sequence. The web server Ar_NHPred is available at and (mirror site).
在本研究中,已尝试开发一种基于神经网络的方法,用于使用多序列比对预测含有芳香族主链NH(Ar-NH)相互作用的蛋白质片段。我们分析了从2298个非冗余蛋白质结构中提取的3121个长度为七个残基且含有Ar-NH相互作用的片段,其中任意两个蛋白质的序列同一性不超过25%。使用标准反向传播作为学习算法,对具有单个隐藏层的两个连续前馈神经网络进行了训练。当使用进化信息(从PSI-BLAST获得的多序列比对)作为输入而非单个序列时,该方法的性能在马修斯相关系数(MCC)值方面从0.12提高到了0.15。当将PSIPRED预测的二级结构信息纳入预测时,该方法的性能进一步从MCC 0.15提高到了0.20。通过五折交叉验证测试时,最终网络的总体预测准确率为70.1%,MCC为0.20。总体而言,该性能比随机预测高15.2%。该方法由两个神经网络组成:(i)一个序列到结构的网络,它从蛋白质序列的多序列比对中预测参与Ar-NH相互作用的芳香族残基;(ii)一个结构到结构的网络,其输入由第一个网络的输出和预测的二级结构组成。此外,使用一个单独的序列到结构神经网络预测了供体残基在“潜在”预测片段内的实际位置。基于本研究,开发了一个服务器Ar_NHPred,它可预测给定氨基酸序列中的Ar-NH相互作用。网络服务器Ar_NHPred可在[具体网址]和[镜像网址]获取。