• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用物理化学性质对模拟蛋白质结构进行质量评估。

Quality assessment of modeled protein structure using physicochemical properties.

作者信息

Rana Prashant Singh, Sharma Harish, Bhattacharya Mahua, Shukla Anupam

机构信息

Department of Information Communication and Technology, ABV-Indian Institute of Information Technology and Management, Gwalior MP-474015, India.

出版信息

J Bioinform Comput Biol. 2015 Apr;13(2):1550005. doi: 10.1142/S0219720015500055. Epub 2014 Dec 19.

DOI:10.1142/S0219720015500055
PMID:25524475
Abstract

Physicochemical properties of proteins always guide to determine the quality of the protein structure, therefore it has been rigorously used to distinguish native or native-like structure from other predicted structures. In this work, we explore nine machine learning methods with six physicochemical properties to predict the Root Mean Square Deviation (RMSD), Template Modeling (TM-score), and Global Distance Test (GDT_TS-score) of modeled protein structure in the absence of its true native state. Physicochemical properties namely total surface area, euclidean distance (ED), total empirical energy, secondary structure penalty (SS), sequence length (SL), and pair number (PN) are used. There are a total of 95,091 modeled structures of 4896 native targets. A real coded Self-adaptive Differential Evolution algorithm (SaDE) is used to determine the feature importance. The K-fold cross validation is used to measure the robustness of the best predictive method. Through the intensive experiments, it is found that Random Forest method outperforms over other machine learning methods. This work makes the prediction faster and inexpensive. The performance result shows the prediction of RMSD, TM-score, and GDT_TS-score on Root Mean Square Error (RMSE) as 1.20, 0.06, and 0.06 respectively; correlation scores are 0.96, 0.92, and 0.91 respectively; R(2) are 0.92, 0.85, and 0.84 respectively; and accuracy are 78.82% (with ± 0.1 err), 86.56% (with ± 0.1 err), and 87.37% (with ± 0.1 err) respectively on the testing data set. The data set used in the study is available as supplement at http://bit.ly/RF-PCP-DataSets.

摘要

蛋白质的物理化学性质始终指导着蛋白质结构质量的判定,因此它一直被严格用于区分天然或类天然结构与其他预测结构。在这项工作中,我们探索了九种机器学习方法,并结合六种物理化学性质,在没有真实天然状态的情况下预测建模蛋白质结构的均方根偏差(RMSD)、模板建模(TM-score)和全局距离测试(GDT_TS-score)。所使用的物理化学性质包括总表面积、欧几里得距离(ED)、总经验能量、二级结构惩罚(SS)、序列长度(SL)和对数(PN)。共有4896个天然靶点的95091个建模结构。使用实值编码的自适应差分进化算法(SaDE)来确定特征重要性。采用K折交叉验证来衡量最佳预测方法的稳健性。通过深入实验发现,随机森林方法优于其他机器学习方法。这项工作使预测更快且成本更低。性能结果表明,在测试数据集上,RMSD、TM-score和GDT_TS-score的均方根误差(RMSE)预测分别为1.20、0.06和0.06;相关分数分别为0.96、0.92和0.91;R(2)分别为0.92、0.85和0.84;准确率分别为78.82%(误差±0.1)、86.56%(误差±0.1)和87.37%(误差±0.1)。该研究中使用的数据集可在http://bit.ly/RF-PCP-DataSets上作为补充获取。

相似文献

1
Quality assessment of modeled protein structure using physicochemical properties.利用物理化学性质对模拟蛋白质结构进行质量评估。
J Bioinform Comput Biol. 2015 Apr;13(2):1550005. doi: 10.1142/S0219720015500055. Epub 2014 Dec 19.
2
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
3
Estimating quality of template-based protein models by alignment stability.通过比对稳定性评估基于模板的蛋白质模型的质量。
Proteins. 2008 May 15;71(3):1255-74. doi: 10.1002/prot.21819.
4
Sub-AQUA: real-value quality assessment of protein structure models.Sub-AQUA:蛋白质结构模型的实值质量评估。
Protein Eng Des Sel. 2010 Aug;23(8):617-32. doi: 10.1093/protein/gzq030. Epub 2010 Jun 4.
5
Role of solvent accessibility for aggregation-prone patches in protein folding.溶剂可及性在蛋白折叠中对聚集倾向斑块的作用。
Sci Rep. 2018 Aug 27;8(1):12896. doi: 10.1038/s41598-018-31289-6.
6
MetaMQAP: a meta-server for the quality assessment of protein models.MetaMQAP:一种用于蛋白质模型质量评估的元服务器。
BMC Bioinformatics. 2008 Sep 29;9:403. doi: 10.1186/1471-2105-9-403.
7
Prediction of protein loop geometries in solution.溶液中蛋白质环结构的预测。
Proteins. 2007 Oct 1;69(1):69-74. doi: 10.1002/prot.21503.
8
Assessment of template-based modeling of protein structure in CASP11.CASP11中基于模板的蛋白质结构建模评估。
Proteins. 2016 Sep;84 Suppl 1(Suppl 1):200-20. doi: 10.1002/prot.25049. Epub 2016 Jun 15.
9
Validation of protein structure models using network similarity score.使用网络相似性分数对蛋白质结构模型进行验证。
Proteins. 2017 Sep;85(9):1759-1776. doi: 10.1002/prot.25332. Epub 2017 Jun 27.
10
A Machine Learning-Based QSAR Model for Benzimidazole Derivatives as Corrosion Inhibitors by Incorporating Comprehensive Feature Selection.基于机器学习的苯并咪唑衍生物作为缓蚀剂的 QSAR 模型,综合特征选择。
Interdiscip Sci. 2019 Dec;11(4):738-747. doi: 10.1007/s12539-019-00346-7. Epub 2019 Sep 4.

引用本文的文献

1
Activity assessment of small drug molecules in estrogen receptor using multilevel prediction model.使用多层次预测模型评估小分子药物在雌激素受体中的活性。
IET Syst Biol. 2019 Jun;13(3):147-158. doi: 10.1049/iet-syb.2018.5068.
2
Tight clustering for large datasets with an application to gene expression data.针对大型数据集的紧密聚类及其在基因表达数据中的应用。
Sci Rep. 2019 Feb 28;9(1):3053. doi: 10.1038/s41598-019-39459-w.