基于抽样的方法，通过整合多个分数和特征对蛋白质结构模型进行排序。

A sampling-based method for ranking protein structural models by integrating multiple scores and features.

机构信息

College of Computer Science and Technology, Jilin University, Jilin, Changchun 130012, China.

出版信息

Curr Protein Pept Sci. 2011 Sep;12(6):540-8. doi: 10.2174/138920311796957658.

DOI:10.2174/138920311796957658

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4368063/

Abstract

One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.

摘要

蛋白质三级结构预测中的主要挑战之一是结构质量评估。在许多情况下，蛋白质结构预测工具可以生成良好的结构模型，但无法从大量候选模型中选择最佳模型作为最终输出。在这项研究中，我们开发了一种基于抽样的机器学习方法，通过整合多个评分和特征来对蛋白质结构模型进行排序。首先，通过从不同数据集训练的两个径向基函数 (RBF) 模型来整合预测的二级结构、溶剂可及性和残基-残基接触信息等特征。然后，通过抽样方法将两个 RBF 得分和五个由他人开发的选择评分函数（Opus-CA、Opus-PSP、DFIRE、RAPDF 和 Cheng 得分）进行综合。最后，另一个集成的 RBF 模型根据抽样分布的特征对结构模型进行排序。我们使用两个不同的数据集（包括所有 CASP8 目标的 CASP 服务器预测模型和我们内部软件 MUFOLD 生成的一组模型）来测试所提出的方法。测试结果表明，我们的方法在最佳模型选择和结构质量预测排名与实际排名之间的整体相关性方面，优于任何单个评分函数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9a4/4368063/90e5940ecb87/nihms670521f1.jpg

相似文献

1

A sampling-based method for ranking protein structural models by integrating multiple scores and features.基于抽样的方法，通过整合多个分数和特征对蛋白质结构模型进行排序。

Curr Protein Pept Sci. 2011 Sep;12(6):540-8. doi: 10.2174/138920311796957658.

2

QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information.QMEANclust：通过结合复合评分函数与结构密度信息来估计蛋白质模型质量。

BMC Struct Biol. 2009 May 20;9:35. doi: 10.1186/1472-6807-9-35.

3

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13.基于深度学习的蛋白质三级结构建模和 CASP13 中的接触距离预测。

Proteins. 2019 Dec;87(12):1165-1178. doi: 10.1002/prot.25697. Epub 2019 Apr 25.

4

SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs.SELECTpro：使用基于结构的抗BLUNDERs能量函数进行有效的蛋白质模型选择。

BMC Struct Biol. 2008 Dec 3;8:52. doi: 10.1186/1472-6807-8-52.

5

Protein structural model selection by combining consensus and single scoring methods.通过组合共识和单评分方法选择蛋白质结构模型。

PLoS One. 2013 Sep 2;8(9):e74006. doi: 10.1371/journal.pone.0074006. eCollection 2013.

6

Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。

Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.

7

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

8

A high-accuracy protein structural class prediction algorithm using predicted secondary structural information.利用预测的二级结构信息进行高精度蛋白质结构类预测算法。

J Theor Biol. 2010 Dec 7;267(3):272-5. doi: 10.1016/j.jtbi.2010.09.007. Epub 2010 Sep 8.

9

Evaluating the absolute quality of a single protein model using structural features and support vector machines.使用结构特征和支持向量机评估单个蛋白质模型的绝对质量。

Proteins. 2009 May 15;75(3):638-47. doi: 10.1002/prot.22275.

10

Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.使用帕累托最优共识方法改进预测的蛋白质环结构排名。

BMC Struct Biol. 2010 Jul 20;10:22. doi: 10.1186/1472-6807-10-22.

引用本文的文献

1

Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling.阐明“混沌地带”：困难蛋白建模的进展。

Methods Mol Biol. 2023;2627:25-40. doi: 10.1007/978-1-0716-2974-1_2.

2

Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms.基于随机森林的蛋白质模型质量评估（RFMQA），使用结构特征和势能项。

PLoS One. 2014 Sep 15;9(9):e106542. doi: 10.1371/journal.pone.0106542. eCollection 2014.

3

Protein structural model selection by combining consensus and single scoring methods.通过组合共识和单评分方法选择蛋白质结构模型。

PLoS One. 2013 Sep 2;8(9):e74006. doi: 10.1371/journal.pone.0074006. eCollection 2013.

本文引用的文献

1

Universal Approximation Using Radial-Basis-Function Networks.使用径向基函数网络的通用逼近

Neural Comput. 1991 Summer;3(2):246-257. doi: 10.1162/neco.1991.3.2.246.

2

MUFOLD: A new solution for protein 3D structure prediction.MUFOLD：一种新的蛋白质三维结构预测解决方案。

Proteins. 2010 Apr;78(5):1137-52. doi: 10.1002/prot.22634.

3

Global and local model quality estimation at CASP8 using the scoring functions QMEAN and QMEANclust.使用评分函数 QMEAN 和 QMEANclust 在 CASP8 中进行全局和局部模型质量评估。

Proteins. 2009;77 Suppl 9:173-80. doi: 10.1002/prot.22532.

4

Assessment of global and local model quality in CASP8 using Pcons and ProQ.使用 Pcons 和 ProQ 评估 CASP8 中的全局和局部模型质量。

Proteins. 2009;77 Suppl 9:167-72. doi: 10.1002/prot.22476.

5

Prediction of global and local quality of CASP8 models by MULTICOM series.MULTICOM 系列预测 CASP8 模型的全局和局部质量。

Proteins. 2009;77 Suppl 9:181-4. doi: 10.1002/prot.22487.

6

Quality assessment of protein structure models.蛋白质结构模型的质量评估。

Curr Protein Pept Sci. 2009 Jun;10(3):216-28. doi: 10.2174/138920309788452173.

7

Evaluating the absolute quality of a single protein model using structural features and support vector machines.使用结构特征和支持向量机评估单个蛋白质模型的绝对质量。

Proteins. 2009 May 15;75(3):638-47. doi: 10.1002/prot.22275.

8

How well can the accuracy of comparative protein structure models be predicted?比较蛋白质结构模型的准确性能被预测到什么程度？

Protein Sci. 2008 Nov;17(11):1881-93. doi: 10.1110/ps.036061.108. Epub 2008 Oct 1.

9

Threading without optimizing weighting factors for scoring function.在不优化评分函数加权因子的情况下进行线程处理。

Proteins. 2008 Nov 15;73(3):581-96. doi: 10.1002/prot.22082.

10

OPUS-PSP: an orientation-dependent statistical all-atom potential derived from side-chain packing.OPUS-PSP：一种基于侧链堆积的取向相关统计全原子势。

J Mol Biol. 2008 Feb 8;376(1):288-301. doi: 10.1016/j.jmb.2007.11.033. Epub 2007 Nov 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验