Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan, Republic of China.
PLoS One. 2012;7(10):e47951. doi: 10.1371/journal.pone.0047951. Epub 2012 Oct 24.
Prediction of protein catalytic residues provides useful information for the studies of protein functions. Most of the existing methods combine both structure and sequence information but heavily rely on sequence conservation from multiple sequence alignments. The contribution of structure information is usually less than that of sequence conservation in existing methods. We found a novel structure feature, residue side chain orientation, which is the first structure-based feature that achieves prediction results comparable to that of evolutionary sequence conservation. We developed a structure-based method, Enzyme Catalytic residue SIde-chain Arrangement (EXIA), which is based on residue side chain orientations and backbone flexibility of protein structure. The prediction that uses EXIA outperforms existing structure-based features. The prediction quality of combing EXIA and sequence conservation exceeds that of the state-of-the-art prediction methods. EXIA is designed to predict catalytic residues from single protein structure without needing sequence or structure alignments. It provides invaluable information when there is no sufficient or reliable homology information for target protein. We found that catalytic residues have very special side chain orientation and designed the EXIA method based on the newly discovered feature. It was also found that EXIA performs well for a dataset of enzymes without any bounded ligand in their crystallographic structures.
预测蛋白质的催化残基为研究蛋白质功能提供了有用的信息。大多数现有的方法结合了结构和序列信息,但严重依赖于来自多重序列比对的序列保守性。在现有的方法中,结构信息的贡献通常小于序列保守性的贡献。我们发现了一种新的结构特征,即残基侧链取向,这是第一个在预测结果上可与进化序列保守性相媲美的基于结构的特征。我们开发了一种基于结构的方法,即酶催化残基侧链排列(EXIA),它基于蛋白质结构的残基侧链取向和骨架灵活性。使用 EXIA 的预测优于现有的基于结构的特征。结合 EXIA 和序列保守性的预测质量超过了最先进的预测方法。EXIA 旨在从单个蛋白质结构预测催化残基,而不需要序列或结构比对。当目标蛋白质没有足够或可靠的同源信息时,它提供了非常有价值的信息。我们发现催化残基具有非常特殊的侧链取向,并基于新发现的特征设计了 EXIA 方法。还发现 EXIA 对没有晶体结构中任何边界配体的酶数据集表现良好。