Tong Wenxu, Williams Ronald J, Wei Ying, Murga Leonel F, Ko Jaeju, Ondrechen Mary Jo
College of Computer and Information Science, Northeastern University, Boston, Massachusetts 02115, USA.
Protein Sci. 2008 Feb;17(2):333-41. doi: 10.1110/ps.073213608. Epub 2007 Dec 20.
Theoretical microscopic titration curves (THEMATICS) is a computational method for the identification of active sites in proteins through deviations in computed titration behavior of ionizable residues. While the sensitivity to catalytic sites is high, the previously reported sensitivity to catalytic residues was not as high, about 50%. Here THEMATICS is combined with support vector machines (SVM) to improve sensitivity for catalytic residue prediction from protein 3D structure alone. For a test set of 64 proteins taken from the Catalytic Site Atlas (CSA), the average recall rate for annotated catalytic residues is 61%; good precision is maintained selecting only 4% of all residues. The average false positive rate, using the CSA annotations is only 3.2%, far lower than other 3D-structure-based methods. THEMATICS-SVM returns higher precision, lower false positive rate, and better overall performance, compared with other 3D-structure-based methods. Comparison is also made with the latest machine learning methods that are based on both sequence alignments and 3D structures. For annotated sets of well-characterized enzymes, THEMATICS-SVM performance compares very favorably with methods that utilize sequence homology. However, since THEMATICS depends only on the 3D structure of the query protein, no decline in performance is expected when applied to novel folds, proteins with few sequence homologues, or even orphan sequences. An extension of the method to predict non-ionizable catalytic residues is also presented. THEMATICS-SVM predicts a local network of ionizable residues with strong interactions between protonation events; this appears to be a special feature of enzyme active sites.
理论微观滴定曲线(THEMATICS)是一种通过可电离残基计算滴定行为的偏差来识别蛋白质活性位点的计算方法。虽然对催化位点的敏感性较高,但先前报道的对催化残基的敏感性并不高,约为50%。在此,THEMATICS与支持向量机(SVM)相结合,以提高仅从蛋白质三维结构预测催化残基的敏感性。对于从催化位点图谱(CSA)中选取的64种蛋白质的测试集,注释催化残基的平均召回率为61%;仅选择所有残基的4%时仍能保持良好的精度。使用CSA注释的平均误报率仅为3.2%,远低于其他基于三维结构的方法。与其他基于三维结构的方法相比,THEMATICS-SVM具有更高的精度、更低的误报率和更好的整体性能。还与基于序列比对和三维结构的最新机器学习方法进行了比较。对于注释良好的酶集,THEMATICS-SVM的性能与利用序列同源性的方法相比非常有利。然而,由于THEMATICS仅依赖于查询蛋白质的三维结构,因此应用于新折叠、序列同源物较少的蛋白质甚至孤儿序列时,预计性能不会下降。还介绍了该方法对不可电离催化残基预测的扩展。THEMATICS-SVM预测了可电离残基的局部网络,质子化事件之间存在强烈相互作用;这似乎是酶活性位点的一个特殊特征。