Ye Kai, Feenstra K Anton, Heringa Jaap, Ijzerman Adriaan P, Marchiori Elena
Division of Medical Chemistry, LACDR, Leiden University, P.O. Box 9502, 2300 RA, Leiden, The Netherlands.
Bioinformatics. 2008 Jan 1;24(1):18-25. doi: 10.1093/bioinformatics/btm537. Epub 2007 Nov 17.
Identification of residues that account for protein function specificity is crucial, not only for understanding the nature of functional specificity, but also for protein engineering experiments aimed at switching the specificity of an enzyme, regulator or transporter. Available algorithms generally use multiple sequence alignments to identify residue positions conserved within subfamilies but divergent in between. However, many biological examples show a much subtler picture than simple intra-group conservation versus inter-group divergence.
We present multi-RELIEF, a novel approach for identifying specificity residues that is based on RELIEF, a state-of-the-art Machine-Learning technique for feature weighting. It estimates the expected 'local' functional specificity of residues from an alignment divided in multiple classes. Optionally, 3D structure information is exploited by increasing the weight of residues that have high-weight neighbors. Using ROC curves over a large body of experimental reference data, we show that (a) multi-RELIEF identifies specificity residues for the seven test sets used, (b) incorporating structural information improves prediction for specificity of interaction with small molecules and (c) comparison of multi-RELIEF with four other state-of-the-art algorithms indicates its robustness and best overall performance.
A web-server implementation of multi-RELIEF is available at www.ibi.vu.nl/programs/multirelief. Matlab source code of the algorithm and data sets are available on request for academic use.
识别决定蛋白质功能特异性的残基至关重要,这不仅有助于理解功能特异性的本质,还对旨在改变酶、调节因子或转运蛋白特异性的蛋白质工程实验具有重要意义。现有的算法通常使用多序列比对来识别亚家族内保守但亚家族间不同的残基位置。然而,许多生物学实例显示出的情况比简单的组内保守与组间差异更为微妙。
我们提出了多RELIEF方法,这是一种基于RELIEF(一种用于特征加权的先进机器学习技术)来识别特异性残基的新方法。它从划分为多个类别的比对中估计残基的预期“局部”功能特异性。可选择地,通过增加具有高权重邻居的残基权重来利用三维结构信息。通过对大量实验参考数据使用ROC曲线,我们表明:(a)多RELIEF方法能够识别所使用的七个测试集的特异性残基;(b)纳入结构信息可改善与小分子相互作用特异性的预测;(c)将多RELIEF方法与其他四种先进算法进行比较表明其具有稳健性和最佳的整体性能。
多RELIEF方法的网络服务器实现可在www.ibi.vu.nl/programs/multirelief获取。算法的Matlab源代码和数据集可根据学术使用需求提供。