• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于一级和二级结构特征,使用随机森林算法预测相似度为 40%的蛋白质序列的结构类别。

Predicting structural class for protein sequences of 40% identity based on features of primary and secondary structure using Random Forest algorithm.

机构信息

Dharmsinh Desai University, Department of Computer Engineering, Faculty of Technology, D D University, Nadiad, 387001, India.

Research and Development Center, Faculty of Technology, Dharmsinh Desai University, Nadiad, 387001, India.

出版信息

Comput Biol Chem. 2020 Feb;84:107164. doi: 10.1016/j.compbiolchem.2019.107164. Epub 2019 Nov 15.

DOI:10.1016/j.compbiolchem.2019.107164
PMID:31806243
Abstract

At present, tertiary structure discovery growth rate is lagging far behind discovery of primary structure. The prediction of protein structural class using Machine Learning techniques can help reduce this gap. The Structural Classification of Protein - Extended (SCOPe 2.07) is latest and largest dataset available at present. The protein sequences with less than 40% identity to each other are used for predicting α, β, α/β and α + β SCOPe classes. The sensitive features are extracted from primary and secondary structure representations of Proteins. Features are extracted experimentally from secondary structure with respect to its frequency, pitch and spatial arrangements. Primary structure based features contain species information for a protein sequence. The species parameters are further validated with uniref100 dataset using TaxId. As it is known, protein tertiary structure is manifestation of function. Functional differences are observed in species. Hence, the species are expected to have strong correlations with structural class, which is discovered in current work. It enhances prediction accuracy by 7%-10%. The subset of SCOPe 2.07 is trained using 65 dimensional feature vector using Random Forest classifier. The test result for the rest of the set gives consistent accuracy of better than 95%. The accuracy achieved on benchmark datasets ASTRAL 1.73, 25PDB and FC699 is better than 86%, 91% and 97% respectively, which is best reported to our knowledge.

摘要

目前,三级结构发现的增长率远远落后于一级结构的发现。使用机器学习技术预测蛋白质结构类别可以帮助缩小这一差距。蛋白质结构分类 - 扩展(SCOPe 2.07)是目前最新和最大的数据集。将彼此之间的序列同一性小于 40%的蛋白质序列用于预测 α、β、α/β 和 α+β SCOPe 类。从蛋白质的一级和二级结构表示中提取敏感特征。从二级结构中以其频率、音高和空间排列提取实验特征。基于一级结构的特征包含蛋白质序列的物种信息。使用 TaxId 进一步使用 uniref100 数据集验证物种参数。众所周知,蛋白质的三级结构是功能的表现。在物种中观察到功能差异。因此,预计物种与结构类别之间存在很强的相关性,这在当前的工作中得到了发现。它将预测精度提高了 7%-10%。使用随机森林分类器对 SCOPe 2.07 的子集进行了 65 维特征向量的训练。对其余部分的测试结果给出了一致的准确率超过 95%。在基准数据集 ASTRAL 1.73、25PDB 和 FC699 上取得的准确率分别优于 86%、91%和 97%,这是我们所知的最佳准确率。

相似文献

1
Predicting structural class for protein sequences of 40% identity based on features of primary and secondary structure using Random Forest algorithm.基于一级和二级结构特征,使用随机森林算法预测相似度为 40%的蛋白质序列的结构类别。
Comput Biol Chem. 2020 Feb;84:107164. doi: 10.1016/j.compbiolchem.2019.107164. Epub 2019 Nov 15.
2
Incorporating secondary structural features into sequence information for predicting protein structural class.将二级结构特征纳入序列信息以预测蛋白质结构类别。
Protein Pept Lett. 2013 Oct;20(10):1079-87. doi: 10.2174/09298665113209990002.
3
Enhanced Protein Structural Class Prediction Using Effective Feature Modeling and Ensemble of Classifiers.利用有效的特征建模和分类器集成增强蛋白质结构类预测。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2409-2419. doi: 10.1109/TCBB.2020.2979430. Epub 2021 Dec 8.
4
A high-accuracy protein structural class prediction algorithm using predicted secondary structural information.利用预测的二级结构信息进行高精度蛋白质结构类预测算法。
J Theor Biol. 2010 Dec 7;267(3):272-5. doi: 10.1016/j.jtbi.2010.09.007. Epub 2010 Sep 8.
5
Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou's pseudo amino acid composition.通过将预测的二级结构信息纳入周的伪氨基酸组成的通用形式,准确预测蛋白质结构类别。
J Theor Biol. 2014 Mar 7;344:12-8. doi: 10.1016/j.jtbi.2013.11.021. Epub 2013 Dec 6.
6
Prediction of protein structural classes for low-homology sequences based on predicted secondary structure.基于预测的二级结构预测低同源序列的蛋白质结构类别。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-11-S1-S9.
7
A novel Multi-Agent Ada-Boost algorithm for predicting protein structural class with the information of protein secondary structure.一种利用蛋白质二级结构信息预测蛋白质结构类别的新型多智能体Ada-Boost算法。
J Bioinform Comput Biol. 2015 Oct;13(5):1550022. doi: 10.1142/S0219720015500225. Epub 2015 Aug 11.
8
Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms.基于机器学习算法的二级结构和进化信息的结构蛋白折叠识别。
Comput Biol Chem. 2021 Apr;91:107456. doi: 10.1016/j.compbiolchem.2021.107456. Epub 2021 Feb 12.
9
A two-stage approach towards protein secondary structure classification.两段式方法用于蛋白质二级结构分类。
Med Biol Eng Comput. 2020 Aug;58(8):1723-1737. doi: 10.1007/s11517-020-02194-w. Epub 2020 May 29.
10
High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure.基于预测的二级结构对低相似度序列进行蛋白质结构类别高精度预测。
Biochimie. 2011 Apr;93(4):710-4. doi: 10.1016/j.biochi.2011.01.001. Epub 2011 Jan 13.

引用本文的文献

1
Prediction of anticancer drug sensitivity using an interpretable model guided by deep learning.利用深度学习指导的可解释模型预测抗癌药物敏感性。
BMC Bioinformatics. 2024 May 9;25(1):182. doi: 10.1186/s12859-024-05669-x.