Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040 Regensburg, Germany and Faculty of Mathematics and Computer Science, University of Hagen, D-58084 Hagen, Germany.
Bioinformatics. 2013 Dec 1;29(23):3029-35. doi: 10.1093/bioinformatics/btt519. Epub 2013 Sep 18.
The precise identification of functionally and structurally important residues of a protein is still an open problem, and state-of-the-art classifiers predict only one or at most two different categories.
We have implemented the classifier CLIPS-4D, which predicts in a mutually exclusively manner a role in catalysis, ligand-binding or protein stability for each residue-position of a protein. Each prediction is assigned a P-value, which enables the statistical assessment and the selection of predictions with similar quality. CLIPS-4D requires as input a multiple sequence alignment and a 3D structure of one protein in PDB format. A comparison with existing methods confirmed state-of-the-art prediction quality, even though CLIPS-4D classifies more specifically than other methods. CLIPS-4D was implemented as a multiclass support vector machine, which exploits seven sequence-based and two structure-based features, each of which was shown to contribute to classification quality. The classification of ligand-binding sites profited most from the 3D features, which were the assessment of the solvent accessible surface area and the identification of surface pockets. In contrast, five additionally tested 3D features did not increase the classification performance achieved with evolutionary signals deduced from the multiple sequence alignment.
精确识别蛋白质的功能和结构重要残基仍然是一个悬而未决的问题,最先进的分类器只能预测一个或最多两个不同的类别。
我们已经实现了分类器 CLIPS-4D,它可以相互排斥地预测蛋白质的每个残基位置在催化、配体结合或蛋白质稳定性方面的作用。每个预测都被分配了一个 P 值,这使得可以进行统计评估,并选择具有相似质量的预测。CLIPS-4D 需要输入一个多序列对齐和一个以 PDB 格式的蛋白质 3D 结构。与现有方法的比较证实了最先进的预测质量,尽管 CLIPS-4D 的分类比其他方法更具体。CLIPS-4D 被实现为一个多类支持向量机,它利用了七个基于序列的和两个基于结构的特征,每个特征都被证明对分类质量有贡献。配体结合位点的分类最受益于 3D 特征,这些特征包括评估溶剂可及表面积和识别表面口袋。相比之下,另外测试的五个 3D 特征并没有提高从多序列比对中推导出的进化信号的分类性能。