University of Hamburg, Center for Bioinformatics, Bundesstr 43, 20146 Hamburg, Germany.
Proteins. 2013 Mar;81(3):479-89. doi: 10.1002/prot.24205. Epub 2012 Dec 24.
Due to the rising number of solved protein structures, computer-based techniques for automatic protein functional annotation and classification into families are of high scientific interest. DoGSiteScorer automatically calculates global descriptors for self-predicted pockets based on the 3D structure of a protein. Protein function predictors on three levels with increasing granularity are built by use of a support vector machine (SVM), based on descriptors of 26632 pockets from enzymes with known structure and enzyme classification. The SVM models represent a generalization of the available descriptor space for each enzyme class, subclass, and substrate-specific sub-subclass. Cross-validation studies show accuracies of 68.2% for predicting the correct main class and accuracies between 62.8% and 80.9% for the six subclasses. Substrate-specific recall rates for a kinase subset are 53.8%. Furthermore, application studies show the ability of the method for predicting the function of unknown proteins and gaining valuable information for the function prediction field.
由于解决的蛋白质结构数量不断增加,基于计算机的自动蛋白质功能注释和分类成家族的技术具有很高的科学意义。DoGSiteScorer 自动基于蛋白质的 3D 结构计算自预测口袋的全局描述符。使用支持向量机 (SVM) 基于具有已知结构和酶分类的 26632 个口袋的描述符,构建了三个具有不同粒度的蛋白质功能预测器。SVM 模型代表了每个酶类、子类和底物特异性子子类的可用描述符空间的泛化。交叉验证研究表明,正确预测主要类别的准确率为 68.2%,六个子类的准确率在 62.8%到 80.9%之间。激酶子集的底物特异性召回率为 53.8%。此外,应用研究表明该方法具有预测未知蛋白质功能的能力,并为功能预测领域提供了有价值的信息。