Fernández Michael, Caballero Julio, Fernández Leyden, Abreu Jose Ignacio, Acosta Gianco
Molecular Modeling Group, Center for Biotechnological Studies, Faculty of Agronomy, University of Matanzas, 44740 Matanzas, Cuba.
Proteins. 2008 Jan 1;70(1):167-75. doi: 10.1002/prot.21524.
This work reports a novel 3D pseudo-folding graph representation of protein sequences for modeling purposes. Amino acids euclidean distances matrices (EDMs) encode primary structural information. Amino Acid Pseudo-Folding 3D Distances Count (AAp3DC) descriptors, calculated from the EDMs of a large data set of 1363 single protein mutants of 64 proteins, were tested for building a classifier for the signs of the change of thermal unfolding Gibbs free energy change (DeltaDeltaG) upon single mutations. An optimum support vector machine (SVM) with a radial basis function (RBF) kernel well recognized stable and unstable mutants with accuracies over 70% in crossvalidation test. To the best of our knowledge, this result for stable mutant recognition is the highest ever reported for a sequence-based predictor with more than 1000 mutants. Furthermore, the model adequately classified mutations associated to diseases of human prion protein and human transthyretin.
这项工作报告了一种用于建模目的的新型蛋白质序列三维伪折叠图表示法。氨基酸欧几里得距离矩阵(EDM)编码一级结构信息。从64种蛋白质的1363个单蛋白突变体的大数据集的EDM中计算出的氨基酸伪折叠三维距离计数(AAp3DC)描述符,被用于构建一个分类器,以预测单突变时热解折叠吉布斯自由能变化(DeltaDeltaG)的变化迹象。具有径向基函数(RBF)核的最优支持向量机(SVM)在交叉验证测试中能够很好地识别稳定和不稳定突变体,准确率超过70%。据我们所知,对于基于序列的预测器识别稳定突变体的这一结果,是超过1000个突变体的报道中最高的。此外,该模型能够充分地对与人类朊病毒蛋白和人类转甲状腺素蛋白疾病相关的突变进行分类。