利用经验信息改进对α螺旋N端的预测。

Improved prediction for N-termini of alpha-helices using empirical information.

作者信息

Wilson Claire L, Boardman Paul E, Doig Andrew J, Hubbard Simon J

机构信息

Department of Biomolecular Sciences, University of Manchester Institute of Science and Technology, Manchester, United Kingdom.

出版信息

Proteins. 2004 Nov 1;57(2):322-30. doi: 10.1002/prot.20218.

DOI:10.1002/prot.20218

PMID:15340919

Abstract

The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.

摘要

根据氨基酸序列预测蛋白质二级结构仍然是解决蛋白质折叠问题的众多方法中的关键组成部分。蛋白质中最常见的规则二级结构形式是α螺旋，其中在N端位置存在特定的残基偏好。从蛋白质数据库（PDB）中观察到的氨基酸频率得出的倾向与基于丙氨酸的肽中不同N端位置的残基的实验自由能密切相关。我们报告了一种新方法，基于现有的流行二级结构预测方法，通过识别α螺旋中正确的N端序列来利用这些数据改进蛋白质二级结构预测。使用这种算法，通过交叉验证测试，正确预测的α螺旋起始位置数量从30%提高到了38%，而总体预测准确率（Q3）保持不变。尽管该算法是在基于多序列比对的二级结构预测上开发和测试的，但它也能够提高使用单序列进行预测的方法对起始位置的预测。此外，改进预测的N端位置的残基频率更好地反映了蛋白质中α螺旋N端位置的残基频率。这对诸如比较建模等领域具有启示意义，在比较建模中，对α螺旋N端区域更准确的预测应该有助于对相邻环区域进行建模的尝试。该算法可作为一个网络工具使用，网址为http://rocky.bms.umist.ac.uk/elephant。