• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习算法的二级结构和进化信息的结构蛋白折叠识别。

Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms.

机构信息

College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.

出版信息

Comput Biol Chem. 2021 Apr;91:107456. doi: 10.1016/j.compbiolchem.2021.107456. Epub 2021 Feb 12.

DOI:10.1016/j.compbiolchem.2021.107456
PMID:33610129
Abstract

Understanding the function of protein is conducive to research in advanced fields such as gene therapy of diseases, the development and design of new drugs, etc. The prerequisite for understanding the function of a protein is to determine its tertiary structure. The realization of protein structure classification is indispensable for this problem and fold recognition is a commonly used method of protein structure classification. Protein sequences of 40% identity in the ASTRAL protein classification database are used for fold recognition research in current work to predict 27 folding types which mostly belong to four protein structural classes: α, β, α/β and α + β. We extract features from primary structure of protein using methods covering DSSP, PSSM and HMM which are based on secondary structure and evolutionary information to convert protein sequences into feature vectors that can be recognized by machine learning algorithm and utilize the combination of LightGBM feature selection algorithm and incremental feature selection method (IFS) to find the optimal classifiers respectively constructed by machine learning algorithms on the basis of tree structure including Random Forest, XGBoost and LightGBM. Bayesian optimization method is used for hyper-parameter adjustment of machine learning algorithms to make the accuracy of fold recognition reach as high as 93.45% at last. The result obtained by the model we propose is outstanding in the study of protein fold recognition.

摘要

了解蛋白质的功能有助于疾病的基因治疗、新药的开发和设计等先进领域的研究。了解蛋白质功能的前提是确定其三级结构。为了解决这个问题,实现蛋白质结构分类是必不可少的,而折叠识别是蛋白质结构分类的常用方法。在当前的工作中,使用 ASTRAL 蛋白质分类数据库中 40%同源性的蛋白质序列进行折叠识别研究,预测 27 种折叠类型,这些折叠类型主要属于四种蛋白质结构类别:α、β、α/β 和 α+β。我们使用基于二级结构和进化信息的 DSSP、PSSM 和 HMM 方法从蛋白质的一级结构中提取特征,将蛋白质序列转换为可以被机器学习算法识别的特征向量,并利用 LightGBM 特征选择算法和增量特征选择方法(IFS)的组合,在包括随机森林、XGBoost 和 LightGBM 的树结构上分别找到由机器学习算法构建的最优分类器。贝叶斯优化方法用于调整机器学习算法的超参数,使折叠识别的准确性最终达到 93.45%。我们提出的模型在蛋白质折叠识别研究中取得了优异的结果。

相似文献

1
Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms.基于机器学习算法的二级结构和进化信息的结构蛋白折叠识别。
Comput Biol Chem. 2021 Apr;91:107456. doi: 10.1016/j.compbiolchem.2021.107456. Epub 2021 Feb 12.
2
Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model.基于序列信息的蛋白质琥珀酰化修饰位点预测的 IFS-LightGBM(BO)模型
Comput Math Methods Med. 2020 Nov 10;2020:8858489. doi: 10.1155/2020/8858489. eCollection 2020.
3
Predicting structural class for protein sequences of 40% identity based on features of primary and secondary structure using Random Forest algorithm.基于一级和二级结构特征,使用随机森林算法预测相似度为 40%的蛋白质序列的结构类别。
Comput Biol Chem. 2020 Feb;84:107164. doi: 10.1016/j.compbiolchem.2019.107164. Epub 2019 Nov 15.
4
A two-stage approach towards protein secondary structure classification.两段式方法用于蛋白质二级结构分类。
Med Biol Eng Comput. 2020 Aug;58(8):1723-1737. doi: 10.1007/s11517-020-02194-w. Epub 2020 May 29.
5
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
6
Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information.利用基于进化和基于结构的信息融合来改进蛋白质折叠识别。
BMC Bioinformatics. 2014;15 Suppl 16(Suppl 16):S12. doi: 10.1186/1471-2105-15-S16-S12. Epub 2014 Dec 8.
7
A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile.基于预测二级结构和 PSI-BLAST -profile 的蛋白质结构类预测方法。
Biochimie. 2014 Feb;97:60-5. doi: 10.1016/j.biochi.2013.09.013. Epub 2013 Sep 22.
8
A Composite Approach to Protein Tertiary Structure Prediction: Hidden Markov Model Based on Lattice.基于格点的隐马尔可夫模型:蛋白质三级结构预测的综合方法
Bull Math Biol. 2019 Mar;81(3):899-918. doi: 10.1007/s11538-018-00542-4. Epub 2018 Dec 10.
9
Protein fold recognition using HMM-HMM alignment and dynamic programming.使用隐马尔可夫模型-隐马尔可夫模型比对和动态规划进行蛋白质折叠识别。
J Theor Biol. 2016 Mar 21;393:67-74. doi: 10.1016/j.jtbi.2015.12.018. Epub 2016 Jan 19.
10
Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.从蛋白质序列中提取特征以改进用于蛋白质折叠识别的深度极限学习机。
J Theor Biol. 2017 May 21;421:1-15. doi: 10.1016/j.jtbi.2017.03.023. Epub 2017 Mar 27.

引用本文的文献

1
Insight into Protein Engineering: From Modelling to Synthesis.蛋白质工程洞察:从建模到合成
Curr Pharm Des. 2025;31(3):179-202. doi: 10.2174/0113816128349577240927071706.
2
BioS2Net: Holistic Structural and Sequential Analysis of Biomolecules Using a Deep Neural Network.BioS2Net:使用深度神经网络对生物分子进行整体结构和序列分析。
Int J Mol Sci. 2022 Mar 9;23(6):2966. doi: 10.3390/ijms23062966.