• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于随机森林的蛋白质模型质量评估(RFMQA),使用结构特征和势能项。

Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms.

作者信息

Manavalan Balachandran, Lee Juyong, Lee Jooyoung

机构信息

Center for In Silico Protein Science, School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea.

出版信息

PLoS One. 2014 Sep 15;9(9):e106542. doi: 10.1371/journal.pone.0106542. eCollection 2014.

DOI:10.1371/journal.pone.0106542
PMID:25222008
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4164442/
Abstract

Recently, predicting proteins three-dimensional (3D) structure from its sequence information has made a significant progress due to the advances in computational techniques and the growth of experimental structures. However, selecting good models from a structural model pool is an important and challenging task in protein structure prediction. In this study, we present the first application of random forest based model quality assessment (RFMQA) to rank protein models using its structural features and knowledge-based potential energy terms. The method predicts a relative score of a model by using its secondary structure, solvent accessibility and knowledge-based potential energy terms. We trained and tested the RFMQA method on CASP8 and CASP9 targets using 5-fold cross-validation. The correlation coefficient between the TM-score of the model selected by RFMQA (TMRF) and the best server model (TMbest) is 0.945. We benchmarked our method on recent CASP10 targets by using CASP8 and 9 server models as a training set. The correlation coefficient and average difference between TMRF and TMbest over 95 CASP10 targets are 0.984 and 0.0385, respectively. The test results show that our method works better in selecting top models when compared with other top performing methods. RFMQA is available for download from http://lee.kias.re.kr/RFMQA/RFMQA_eval.tar.gz.

摘要

近年来,由于计算技术的进步和实验结构数量的增加,根据蛋白质序列信息预测其三维(3D)结构取得了重大进展。然而,在蛋白质结构预测中,从结构模型库中选择良好的模型是一项重要且具有挑战性的任务。在本研究中,我们首次应用基于随机森林的模型质量评估(RFMQA),利用蛋白质模型的结构特征和基于知识的势能项对其进行排序。该方法通过模型的二级结构、溶剂可及性和基于知识的势能项来预测模型的相对得分。我们使用5折交叉验证在CASP8和CASP9目标上对RFMQA方法进行了训练和测试。RFMQA选择的模型(TMRF)与最佳服务器模型(TMbest)的TM得分之间的相关系数为0.945。我们以CASP8和9服务器模型作为训练集,在最近的CASP10目标上对我们的方法进行了基准测试。在95个CASP10目标上,TMRF与TMbest之间的相关系数和平均差异分别为0.984和0.0385。测试结果表明,与其他表现最佳的方法相比,我们的方法在选择顶级模型方面表现更好。RFMQA可从http://lee.kias.re.kr/RFMQA/RFMQA_eval.tar.gz下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/ece4d26f59ca/pone.0106542.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/127820252ce9/pone.0106542.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/81621ed44853/pone.0106542.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/72afeda28741/pone.0106542.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/a64b93fc7deb/pone.0106542.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/55f63da78735/pone.0106542.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/b2d6928e3711/pone.0106542.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/ece4d26f59ca/pone.0106542.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/127820252ce9/pone.0106542.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/81621ed44853/pone.0106542.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/72afeda28741/pone.0106542.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/a64b93fc7deb/pone.0106542.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/55f63da78735/pone.0106542.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/b2d6928e3711/pone.0106542.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13c1/4164442/ece4d26f59ca/pone.0106542.g007.jpg

相似文献

1
Random forest-based protein model quality assessment (RFMQA) using structural features and potential energy terms.基于随机森林的蛋白质模型质量评估(RFMQA),使用结构特征和势能项。
PLoS One. 2014 Sep 15;9(9):e106542. doi: 10.1371/journal.pone.0106542. eCollection 2014.
2
SVMQA: support-vector-machine-based protein single-model quality assessment.SVMQA:基于支持向量机的蛋白质单模型质量评估。
Bioinformatics. 2017 Aug 15;33(16):2496-2503. doi: 10.1093/bioinformatics/btx222.
3
Quality assessment of protein model-structures based on structural and functional similarities.基于结构和功能相似性的蛋白质模型结构质量评估。
BMC Bioinformatics. 2012 Sep 21;13:242. doi: 10.1186/1471-2105-13-242.
4
Refined template selection and combination algorithm significantly improves template-based modeling accuracy.优化的模板选择与组合算法显著提高了基于模板的建模精度。
J Bioinform Comput Biol. 2019 Apr;17(2):1950006. doi: 10.1142/S0219720019500069.
5
SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs.SELECTpro:使用基于结构的抗BLUNDERs能量函数进行有效的蛋白质模型选择。
BMC Struct Biol. 2008 Dec 3;8:52. doi: 10.1186/1472-6807-8-52.
6
A conformation ensemble approach to protein residue-residue contact.一种用于蛋白质残基-残基接触的构象系综方法。
BMC Struct Biol. 2011 Oct 12;11:38. doi: 10.1186/1472-6807-11-38.
7
Improved model quality assessment using ProQ2.使用 ProQ2 提高模型质量评估。
BMC Bioinformatics. 2012 Sep 10;13:224. doi: 10.1186/1471-2105-13-224.
8
Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features.使用大规模规则集集合和多种预测结构特征融合的接触图预测。
Bioinformatics. 2012 Oct 1;28(19):2441-8. doi: 10.1093/bioinformatics/bts472. Epub 2012 Jul 25.
9
PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.PDP-CON:使用共识方法预测蛋白质序列中的结构域/连接子残基。
J Mol Model. 2016 Apr;22(4):72. doi: 10.1007/s00894-016-2933-0. Epub 2016 Mar 11.
10
A sampling-based method for ranking protein structural models by integrating multiple scores and features.基于抽样的方法,通过整合多个分数和特征对蛋白质结构模型进行排序。
Curr Protein Pept Sci. 2011 Sep;12(6):540-8. doi: 10.2174/138920311796957658.

引用本文的文献

1
Estimation of model accuracy by a unique set of features and tree-based regressor.通过一组独特的特征和基于树的回归器来估计模型的准确性。
Sci Rep. 2022 Aug 18;12(1):14074. doi: 10.1038/s41598-022-17097-z.
2
Protein Model Quality Estimation Using Molecular Dynamics Simulation.使用分子动力学模拟进行蛋白质模型质量评估。
ACS Omega. 2022 Jul 5;7(28):24274-24281. doi: 10.1021/acsomega.2c01475. eCollection 2022 Jul 19.
3
Decoy selection for protein structure prediction via extreme gradient boosting and ranking.通过极端梯度提升和排序选择蛋白质结构预测的诱饵。

本文引用的文献

1
Protein structure modeling for CASP10 by multiple layers of global optimization.通过多层全局优化进行CASP10的蛋白质结构建模。
Proteins. 2014 Feb;82 Suppl 2:188-95. doi: 10.1002/prot.24397. Epub 2013 Oct 24.
2
Improved network community structure improves function prediction.网络社区结构的改进提高了功能预测的准确性。
Sci Rep. 2013;3:2197. doi: 10.1038/srep02197.
3
Assessment of the assessment: evaluation of the model quality estimates in CASP10.评估的评估:对CASP10中模型质量评估的评价
BMC Bioinformatics. 2020 Dec 9;21(Suppl 1):189. doi: 10.1186/s12859-020-3523-9.
4
Recent Development of Machine Learning Methods in Microbial Phosphorylation Sites.机器学习方法在微生物磷酸化位点研究中的最新进展
Curr Genomics. 2020 Apr;21(3):194-203. doi: 10.2174/1389202921666200427210833.
5
QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks.QDeep:基于距离的蛋白质模型质量估计,通过基于残基的集成误差分类,使用堆叠深度残差神经网络。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i285-i291. doi: 10.1093/bioinformatics/btaa455.
6
A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods.使用机器学习方法进行微小RNA前体识别的简要综述
Curr Genomics. 2020 Jan;21(1):11-25. doi: 10.2174/1389202921666200214125102.
7
MASS: predict the global qualities of individual protein models using random forests and novel statistical potentials.MASS:使用随机森林和新的统计势能预测个体蛋白质模型的全局性质。
BMC Bioinformatics. 2020 Jul 6;21(Suppl 4):246. doi: 10.1186/s12859-020-3383-3.
8
Machine Learning Approaches for Quality Assessment of Protein Structures.机器学习方法在蛋白质结构质量评估中的应用。
Biomolecules. 2020 Apr 17;10(4):626. doi: 10.3390/biom10040626.
9
BIPEP: Sequence-based Prediction of Biofilm Inhibitory Peptides Using a Combination of NMR and Physicochemical Descriptors.BIPEP:结合核磁共振和物理化学描述符基于序列预测生物膜抑制肽
ACS Omega. 2020 Mar 26;5(13):7290-7297. doi: 10.1021/acsomega.9b04119. eCollection 2020 Apr 7.
10
Energy-based graph convolutional networks for scoring protein docking models.基于能量的图卷积网络在蛋白质对接模型评分中的应用。
Proteins. 2020 Aug;88(8):1091-1099. doi: 10.1002/prot.25888. Epub 2020 Mar 16.
Proteins. 2014 Feb;82 Suppl 2(0 2):112-26. doi: 10.1002/prot.24347. Epub 2013 Aug 31.
4
Hidden information revealed by optimal community structure from a protein-complex bipartite network improves protein function prediction.最优社区结构揭示蛋白质复合物二分网络中的隐藏信息,提高蛋白质功能预测。
PLoS One. 2013;8(4):e60372. doi: 10.1371/journal.pone.0060372. Epub 2013 Apr 5.
5
PROTS-RF: a robust model for predicting mutation-induced protein stability changes.PROTS-RF:一种用于预测突变诱导的蛋白质稳定性变化的稳健模型。
PLoS One. 2012;7(10):e47247. doi: 10.1371/journal.pone.0047247. Epub 2012 Oct 15.
6
Improved model quality assessment using ProQ2.使用 ProQ2 提高模型质量评估。
BMC Bioinformatics. 2012 Sep 10;13:224. doi: 10.1186/1471-2105-13-224.
7
Sann: solvent accessibility prediction of proteins by nearest neighbor method.三恩:用最近邻方法预测蛋白质的溶剂可及性。
Proteins. 2012 Jul;80(7):1791-7. doi: 10.1002/prot.24074. Epub 2012 May 8.
8
GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction.GOAP:一种广义的、基于取向的、全原子蛋白质结构预测统计势能。
Biophys J. 2011 Oct 19;101(8):2043-52. doi: 10.1016/j.bpj.2011.09.012.
9
Evaluation of model quality predictions in CASP9.CASP9 模型质量预测评估。
Proteins. 2011;79 Suppl 10(Suppl 10):91-106. doi: 10.1002/prot.23180. Epub 2011 Oct 14.
10
A sampling-based method for ranking protein structural models by integrating multiple scores and features.基于抽样的方法,通过整合多个分数和特征对蛋白质结构模型进行排序。
Curr Protein Pept Sci. 2011 Sep;12(6):540-8. doi: 10.2174/138920311796957658.