比较蛋白质结构模型的准确性能被预测到什么程度？

How well can the accuracy of comparative protein structure models be predicted?

作者信息

Eramian David, Eswar Narayanan, Shen Min-Yi, Sali Andrej

机构信息

Graduate Group in Biophysics, University of California at San Francisco, California 94158, USA.

出版信息

Protein Sci. 2008 Nov;17(11):1881-93. doi: 10.1110/ps.036061.108. Epub 2008 Oct 1.

DOI:10.1110/ps.036061.108

PMID:18832340

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2578807/

Abstract

Comparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the Calpha root-mean-squared deviation (RMSD) and native overlap (NO3.5A) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model-specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5A errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (r) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.

摘要

与通过实验确定的蛋白质结构相比，比较结构模型可用于多两个数量级的蛋白质序列。然而，这些模型存在两个实验确定的结构所没有的局限性：它们经常包含重大错误，并且其准确性难以轻易评估。我们通过开发一种专门优化的方案来解决后一个局限性，该方案用于在没有天然结构的情况下预测模型的Cα均方根偏差（RMSD）和天然重叠（NO3.5A）误差。与大多数传统评估分数仅仅预测一个模型比其他模型更准确不同，这种方法从绝对意义上量化误差，从而有助于确定该模型是否适用于预期应用。该评估依赖于由支持向量机构建的特定于模型的评分函数。这种回归优化了多达九个特征的权重，这些特征包括从针对被评估模型的定制训练模型集中提取的各种序列相似性度量和统计势：如果可能，我们使用具有相同折叠的大小相似的模型；否则，我们使用具有相同二级结构组成的大小相似的模型。该方案预测了6174个序列的580317个不同比较模型的RMSD和NO3.5A误差，与实际误差的相关系数（r）分别为0.84和0.86。与其他13个测试评估标准（相关系数范围为0.35至0.71）相比，该评分函数实现了最佳相关性。

相似文献

How well can the accuracy of comparative protein structure models be predicted?比较蛋白质结构模型的准确性能被预测到什么程度？

Protein Sci. 2008 Nov;17(11):1881-93. doi: 10.1110/ps.036061.108. Epub 2008 Oct 1.

Estimating quality of template-based protein models by alignment stability.通过比对稳定性评估基于模板的蛋白质模型的质量。

Proteins. 2008 May 15;71(3):1255-74. doi: 10.1002/prot.21819.

TOUCHSTONE II: a new approach to ab initio protein structure prediction.试金石二号：从头开始预测蛋白质结构的新方法。

Biophys J. 2003 Aug;85(2):1145-64. doi: 10.1016/S0006-3495(03)74551-2.

FragQA: predicting local fragment quality of a sequence-structure alignment.FragQA：预测序列-结构比对的局部片段质量

Genome Inform. 2007;19:27-39.

Sub-AQUA: real-value quality assessment of protein structure models.Sub-AQUA：蛋白质结构模型的实值质量评估。

Protein Eng Des Sel. 2010 Aug;23(8):617-32. doi: 10.1093/protein/gzq030. Epub 2010 Jun 4.

MODBASE, a database of annotated comparative protein structure models, and associated resources.MODBASE，一个带注释的比较蛋白质结构模型数据库及相关资源。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D217-22. doi: 10.1093/nar/gkh095.

PFRES: protein fold classification by using evolutionary information and predicted secondary structure.PFRES：利用进化信息和预测的二级结构进行蛋白质折叠分类

Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.

Effective optimization algorithms for fragment-assembly based protein structure prediction.用于基于片段组装的蛋白质结构预测的有效优化算法。

Comput Syst Bioinformatics Conf. 2006:19-29.

MetaMQAP: a meta-server for the quality assessment of protein models.MetaMQAP：一种用于蛋白质模型质量评估的元服务器。

BMC Bioinformatics. 2008 Sep 29;9:403. doi: 10.1186/1471-2105-9-403.

ProVal: a protein-scoring function for the selection of native and near-native folds.ProVal：一种用于选择天然和近天然折叠结构的蛋白质评分函数。

Proteins. 2004 Feb 1;54(2):289-302. doi: 10.1002/prot.10523.

引用本文的文献

Bayesian Nonparametric Analysis of Residence Times for Protein-Lipid Interactions in Molecular Dynamics Simulations.分子动力学模拟中蛋白质 - 脂质相互作用停留时间的贝叶斯非参数分析

J Chem Theory Comput. 2025 Apr 22;21(8):4203-4220. doi: 10.1021/acs.jctc.4c01522. Epub 2025 Apr 2.

Bayesian nonparametric analysis of residence times for protein-lipid interactions in Molecular Dynamics simulations.分子动力学模拟中蛋白质 - 脂质相互作用停留时间的贝叶斯非参数分析。

bioRxiv. 2025 Mar 4:2024.11.07.622502. doi: 10.1101/2024.11.07.622502.

Structural coverage of the human interactome.人类相互作用组的结构覆盖。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad496.

M Protein from Dengue virus oligomerizes to pentameric channel protein: in silico analysis study.登革病毒的M蛋白寡聚形成五聚体通道蛋白：计算机模拟分析研究。

Genomics Inform. 2023 Sep;21(3):e41. doi: 10.5808/gi.23035. Epub 2023 Sep 27.

Modeling the Characteristic Residues of Chlorophyll Synthase (ChlF) from to Determine Its Reaction Mechanism.通过建模分析叶绿素合酶（ChlF）的特征残基以确定其反应机制。

Microorganisms. 2023 Sep 13;11(9):2305. doi: 10.3390/microorganisms11092305.

The evolution of the HIV-1 protease folding stability.HIV-1蛋白酶折叠稳定性的演变

Virus Evol. 2022 Dec 5;8(2):veac115. doi: 10.1093/ve/veac115. eCollection 2022.

Definition of the Acceptor Substrate Binding Specificity in Plant Xyloglucan Endotransglycosylases Using Computational Chemistry.使用计算化学定义植物木葡聚糖内切糖基转移酶的受体底物结合特异性。

Int J Mol Sci. 2022 Oct 5;23(19):11838. doi: 10.3390/ijms231911838.

All-Atom Simulations Uncover Structural and Dynamical Properties of STING Proteins in the Membrane System.全原子模拟揭示膜系统中 STING 蛋白的结构和动力学性质。

J Chem Inf Model. 2022 Sep 26;62(18):4486-4499. doi: 10.1021/acs.jcim.2c00595. Epub 2022 Sep 14.

The tomato cytochrome P450 CYP712G1 catalyses the double oxidation of orobanchol en route to the rhizosphere signalling strigolactone, solanacol.番茄细胞色素 P450 CYP712G1 催化了独脚金属仙草醇向根际信号分子独脚金属仙草素、茄呢醇的双氧化反应。

New Phytol. 2022 Sep;235(5):1884-1899. doi: 10.1111/nph.18272. Epub 2022 Jun 18.

A Benchmark Dataset for Evaluating Practical Performance of Model Quality Assessment of Homology Models.一个用于评估同源模型质量评估实际性能的基准数据集。

Bioengineering (Basel). 2022 Mar 15;9(3):118. doi: 10.3390/bioengineering9030118.

本文引用的文献

FragQA: predicting local fragment quality of a sequence-structure alignment.FragQA：预测序列-结构比对的局部片段质量

Genome Inform. 2007;19:27-39.

Comparative protein structure modeling using MODELLER.使用MODELLER进行比较蛋白质结构建模。

Curr Protoc Protein Sci. 2007 Nov;Chapter 2:Unit 2.9. doi: 10.1002/0471140864.ps0209s50.

The ModFOLD server for the quality assessment of protein structural models.用于蛋白质结构模型质量评估的ModFOLD服务器。

Bioinformatics. 2008 Feb 15;24(4):586-7. doi: 10.1093/bioinformatics/btn014. Epub 2008 Jan 9.

OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing.OPUS-PSP：一种基于侧链堆积的取向相关统计全原子势。

J Mol Biol. 2008 Feb 8;376(1):288-301. doi: 10.1016/j.jmb.2007.11.033. Epub 2007 Nov 19.

Ranking predicted protein structures with support vector regression.使用支持向量回归对预测的蛋白质结构进行排名。

Proteins. 2008 May 15;71(3):1175-82. doi: 10.1002/prot.21809.

Fold assessment for comparative protein structure modeling.用于比较蛋白质结构建模的折叠评估

Protein Sci. 2007 Nov;16(11):2412-26. doi: 10.1110/ps.072895107. Epub 2007 Sep 28.

Reduced C(beta) statistical potentials can outperform all-atom potentials in decoy identification.在诱饵识别中，降低的C（β）统计势比全原子势表现更优。

Protein Sci. 2007 Oct;16(10):2123-39. doi: 10.1110/ps.072939707.

Benchmarking consensus model quality assessment for protein fold recognition.蛋白质折叠识别的基准共识模型质量评估

BMC Bioinformatics. 2007 Sep 18;8:345. doi: 10.1186/1471-2105-8-345.

Stochastic pairwise alignments and scoring methods for comparative protein structure modeling.用于比较蛋白质结构建模的随机成对比对和评分方法。

J Chem Inf Model. 2007 May-Jun;47(3):1263-70. doi: 10.1021/ci600485s. Epub 2007 Mar 29.

Statistical potential for assessment and prediction of protein structures.用于蛋白质结构评估和预测的统计势

Protein Sci. 2006 Nov;15(11):2507-24. doi: 10.1110/ps.062416606.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验