• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用随机森林算法预测β-发夹基序。

Using random forest algorithm to predict β-hairpin motifs.

作者信息

Jia Shao-Chun, Hu Xiu-Zhen

机构信息

College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051. P.R China.

出版信息

Protein Pept Lett. 2011 Jun;18(6):609-17. doi: 10.2174/092986611795222777.

DOI:10.2174/092986611795222777
PMID:21309739
Abstract

A novel method is presented for predicting β-hairpin motifs in protein sequences. That is Random Forest algorithm on the basis of the multi-characteristic parameters, which include amino acids component of position, hydropathy component of position, predicted secondary structure information and value of auto-correlation function. Firstly, the method is trained and tested on a set of 8,291 β-hairpin motifs and 6,865 non-β-hairpin motifs. The overall accuracy and Matthew's correlation coefficient achieve 82.2% and 0.64 using 5-fold cross-validation, while they achieve 81.7% and 0.63 using the independent test. Secondly, the method is also tested on a set of 4,884 β-hairpin motifs and 4,310 non-β-hairpin motifs which is used in previous studies. The overall accuracy and Matthew's correlation coefficient achieve 80.9% and 0.61 for 5-fold cross-validation, while they achieve 80.6% and 0.60 for the independent test. Compared with the previous, the present result is better. Thirdly, 4,884 β-hairpin motifs and 4,310 non-β-hairpin motifs selected as the training set, and 8,291 β-hairpin motifs and 6,865 non-β-hairpin motifs selected as the independent testing set, the overall accuracy and Matthew's correlation coefficient achieve 81.5% and 0.63 with the independent test.

摘要

提出了一种预测蛋白质序列中β-发夹基序的新方法。即基于多特征参数的随机森林算法,这些参数包括位置的氨基酸组成、位置的亲水性组成、预测的二级结构信息和自相关函数值。首先,该方法在一组8291个β-发夹基序和6865个非β-发夹基序上进行训练和测试。使用5折交叉验证时,总体准确率和马修斯相关系数分别达到82.2%和0.64,而使用独立测试时分别达到81.7%和0.63。其次,该方法还在先前研究中使用的一组4884个β-发夹基序和4310个非β-发夹基序上进行测试。5折交叉验证时,总体准确率和马修斯相关系数分别达到80.9%和0.61,而独立测试时分别达到80.6%和0.60。与之前相比,目前的结果更好。第三,选择4884个β-发夹基序和4310个非β-发夹基序作为训练集,8291个β-发夹基序和6865个非β-发夹基序作为独立测试集,独立测试时总体准确率和马修斯相关系数达到81.5%和0.63。

相似文献

1
Using random forest algorithm to predict β-hairpin motifs.使用随机森林算法预测β-发夹基序。
Protein Pept Lett. 2011 Jun;18(6):609-17. doi: 10.2174/092986611795222777.
2
Recognition of beta-hairpin motifs in proteins by using the composite vector.利用复合向量识别蛋白质中的β发夹基序。
Amino Acids. 2010 Mar;38(3):915-21. doi: 10.1007/s00726-009-0299-7. Epub 2009 May 6.
3
Identify Beta-Hairpin Motifs with Quadratic Discriminant Algorithm Based on the Chemical Shifts.基于化学位移,用二次判别算法识别β-发夹基序。
PLoS One. 2015 Sep 30;10(9):e0139280. doi: 10.1371/journal.pone.0139280. eCollection 2015.
4
Beta-hairpin prediction with quadratic discriminant analysis using diversity measure.使用多样性测度的二次判别分析进行 β-发夹预测。
J Comput Chem. 2009 Nov 15;30(14):2277-84. doi: 10.1002/jcc.21229.
5
Prediction of the beta-hairpins in proteins using support vector machine.使用支持向量机预测蛋白质中的β-发夹结构。
Protein J. 2008 Feb;27(2):115-22. doi: 10.1007/s10930-007-9114-z.
6
Using feature optimization-based support vector machine method to recognize the β-hairpin motifs in enzymes.使用基于特征优化的支持向量机方法识别酶中的β-发夹基序。
Saudi J Biol Sci. 2017 Sep;24(6):1361-1369. doi: 10.1016/j.sjbs.2016.11.014. Epub 2016 Nov 28.
7
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
8
Strand-loop-strand motifs: prediction of hairpins and diverging turns in proteins.链-环-链基序:蛋白质中发夹结构和发散转角的预测
Proteins. 2004 Feb 1;54(2):282-8. doi: 10.1002/prot.10589.
9
A systematic analysis of the beta hairpin motif in the Protein Data Bank.对蛋白质数据库中β发夹模体的系统分析。
Protein Sci. 2021 Mar;30(3):613-623. doi: 10.1002/pro.4020. Epub 2021 Jan 7.
10
Prediction of RNA-binding residues in proteins from primary sequence using an enriched random forest model with a novel hybrid feature.基于新型混合特征的富集随机森林模型预测蛋白质中 RNA 结合残基的一级序列
Proteins. 2011 Apr;79(4):1230-9. doi: 10.1002/prot.22958. Epub 2011 Jan 25.

引用本文的文献

1
Recent Advances in Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences.从蛋白质序列预测二级和超二级结构的计算方法的最新进展
Methods Mol Biol. 2025;2870:1-19. doi: 10.1007/978-1-0716-4213-9_1.
2
A Novel Machine Learning Strategy for the Prediction of Antihypertensive Peptides Derived from Food with High Efficiency.一种用于高效预测源自食物的降压肽的新型机器学习策略。
Foods. 2021 Mar 6;10(3):550. doi: 10.3390/foods10030550.
3
Recognizing Ion Ligand-Binding Residues by Random Forest Algorithm Based on Optimized Dihedral Angle.
基于优化二面角的随机森林算法识别离子配体结合残基
Front Bioeng Biotechnol. 2020 Jun 12;8:493. doi: 10.3389/fbioe.2020.00493. eCollection 2020.
4
The recognition of multi-class protein folds by adding average chemical shifts of secondary structure elements.通过添加二级结构元件的平均化学位移来识别多类蛋白质折叠。
Saudi J Biol Sci. 2016 Mar;23(2):189-97. doi: 10.1016/j.sjbs.2015.10.008. Epub 2015 Dec 11.
5
Prediction of complex super-secondary structure βαβ motifs based on combined features.基于组合特征的复杂超二级结构βαβ模体预测
Saudi J Biol Sci. 2016 Jan;23(1):66-71. doi: 10.1016/j.sjbs.2015.10.005. Epub 2015 Nov 12.
6
An Ensemble Method to Distinguish Bacteriophage Virion from Non-Virion Proteins Based on Protein Sequence Characteristics.一种基于蛋白质序列特征区分噬菌体病毒体与非病毒体蛋白质的集成方法。
Int J Mol Sci. 2015 Sep 9;16(9):21734-58. doi: 10.3390/ijms160921734.
7
Discriminating between deleterious and neutral non-frameshifting indels based on protein interaction networks and hybrid properties.基于蛋白质相互作用网络和混合特性区分有害和中性非移码插入缺失。
Mol Genet Genomics. 2015 Feb;290(1):343-52. doi: 10.1007/s00438-014-0922-5. Epub 2014 Sep 24.
8
Recognition of 27-class protein folds by adding the interaction of segments and motif information.通过添加片段相互作用和基序信息来识别27类蛋白质折叠。
Biomed Res Int. 2014;2014:262850. doi: 10.1155/2014/262850. Epub 2014 Jul 21.
9
Prediction and Analysis of Post-Translational Pyruvoyl Residue Modification Sites from Internal Serines in Proteins.蛋白质内部丝氨酸翻译后丙酮酰残基修饰位点的预测与分析
PLoS One. 2013 Jun 21;8(6):e66678. doi: 10.1371/journal.pone.0066678. Print 2013.
10
Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection.RNA结合蛋白结构域的综合比较分析与鉴定:多类分类与特征选择
J Theor Biol. 2012 Nov 7;312:65-75. doi: 10.1016/j.jtbi.2012.07.013. Epub 2012 Aug 3.