Department of Biological Science, Purdue University, 249 S. Martin Jischke Street, West Lafayette, IN, USA.
Department of Computer Science, Purdue University, 305 N. University Street, West Lafayette, IN, USA.
Sci Rep. 2017 Jan 11;7:40629. doi: 10.1038/srep40629.
Protein tertiary structure prediction methods have matured in recent years. However, some proteins defy accurate prediction due to factors such as inadequate template structures. While existing model quality assessment methods predict global model quality relatively well, there is substantial room for improvement in local quality assessment, i.e. assessment of the error at each residue position in a model. Local quality is a very important information for practical applications of structure models such as interpreting/designing site-directed mutagenesis of proteins. We have developed a novel local quality assessment method for protein tertiary structure models. The method, named Graph-based Model Quality assessment method (GMQ), explicitly considers the predicted quality of spatially neighboring residues using a graph representation of a query protein structure model. GMQ uses conditional random field as its core of the algorithm, and performs a binary prediction of the quality of each residue in a model, indicating if a residue position is likely to be within an error cutoff or not. The accuracy of GMQ was improved by considering larger graphs to include quality information of more surrounding residues. Moreover, we found that using different edge weights in graphs reflecting different secondary structures further improves the accuracy. GMQ showed competitive performance on a benchmark for quality assessment of structure models from the Critical Assessment of Techniques for Protein Structure Prediction (CASP).
近年来,蛋白质三级结构预测方法已经成熟。然而,由于模板结构不足等因素,某些蛋白质难以进行准确预测。虽然现有的模型质量评估方法可以相对准确地预测全局模型质量,但在局部质量评估(即评估模型中每个残基位置的误差)方面仍有很大的改进空间。局部质量是结构模型实际应用(如解释/设计蛋白质的定点突变)的非常重要的信息。我们已经开发了一种新的蛋白质三级结构模型的局部质量评估方法。该方法名为基于图的模型质量评估方法(GMQ),它使用查询蛋白质结构模型的图表示来明确考虑空间相邻残基的预测质量。GMQ 使用条件随机场作为其算法的核心,对模型中每个残基的质量进行二进制预测,指示残基位置是否可能位于误差截止值内。通过考虑更大的图来包含更多周围残基的质量信息,提高了 GMQ 的准确性。此外,我们发现使用反映不同二级结构的不同边权重的图进一步提高了准确性。GMQ 在蛋白质结构预测技术评估(Critical Assessment of Techniques for Protein Structure Prediction,CASP)的结构模型质量评估基准测试中表现出了有竞争力的性能。