• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于简化氨基酸和混合特征的嗜热蛋白预测方法。

A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features.

作者信息

Feng Changli, Ma Zhaogui, Yang Deyun, Li Xin, Zhang Jun, Li Yanjuan

机构信息

College of Information Science and Technology, Taishan University, Tai'an, China.

Department of Rehabilitation, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.

出版信息

Front Bioeng Biotechnol. 2020 May 5;8:285. doi: 10.3389/fbioe.2020.00285. eCollection 2020.

DOI:10.3389/fbioe.2020.00285
PMID:32432088
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7214540/
Abstract

The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design.

摘要

蛋白质的热稳定性是酶工程中考虑的关键因素,找到一种能够识别嗜热蛋白和非嗜热蛋白的方法将有助于酶的设计。在本研究中,我们建立了一种结合混合特征和机器学习的新方法来完成这一识别任务。在该方法中,采用氨基酸缩减方案对氨基酸序列进行重新编码。然后,计算并整合物理化学特征、自交叉协方差(ACC)和缩减二肽,形成一个混合特征集,使用相关分析、特征选择和主成分分析(PCA)对其进行处理以去除冗余信息。最后,使用四种机器学习方法以及一个数据集进行数据训练和预测,该数据集包含从915个嗜热蛋白中随机选取的500个观测值以及从793个非嗜热蛋白中随机选取的500个样本。实验结果表明,使用10折交叉验证时,98.2%的嗜热蛋白和非嗜热蛋白被正确识别。此外,我们对最终保留特征和去除特征的分析产生了关于关键、不重要和不敏感元素的信息,这也为酶的设计提供了重要信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ab/7214540/a7835f9bfe55/fbioe-08-00285-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ab/7214540/f5b0738533ff/fbioe-08-00285-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ab/7214540/dab1af32fa68/fbioe-08-00285-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ab/7214540/4e2f456e5cbf/fbioe-08-00285-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ab/7214540/a7835f9bfe55/fbioe-08-00285-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ab/7214540/f5b0738533ff/fbioe-08-00285-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ab/7214540/dab1af32fa68/fbioe-08-00285-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ab/7214540/4e2f456e5cbf/fbioe-08-00285-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ab/7214540/a7835f9bfe55/fbioe-08-00285-g004.jpg

相似文献

1
A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features.一种基于简化氨基酸和混合特征的嗜热蛋白预测方法。
Front Bioeng Biotechnol. 2020 May 5;8:285. doi: 10.3389/fbioe.2020.00285. eCollection 2020.
2
Prediction of thermophilic proteins using feature selection technique.利用特征选择技术预测嗜热蛋白。
J Microbiol Methods. 2011 Jan;84(1):67-70. doi: 10.1016/j.mimet.2010.10.013. Epub 2010 Oct 31.
3
Prediction of thermophilic protein using 2-D general series correlation pseudo amino acid features.利用二维广义序列相关伪氨基酸组成特征预测嗜热蛋白。
Methods. 2023 Oct;218:141-148. doi: 10.1016/j.ymeth.2023.08.012. Epub 2023 Aug 19.
4
Prediction of thermophilic protein with pseudo amino Acid composition: an approach from combined feature selection and reduction.基于伪氨基酸组成的嗜热蛋白预测:一种结合特征选择与约简的方法
Protein Pept Lett. 2011 Jul;18(7):684-9. doi: 10.2174/092986611795446085.
5
Detecting thermophilic proteins through selecting amino acid and dipeptide composition features.通过选择氨基酸和二肽组成特征来检测嗜热蛋白。
Amino Acids. 2012 May;42(5):1947-53. doi: 10.1007/s00726-011-0923-1. Epub 2011 May 6.
6
A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides.一种新的基于序列的预测器,用于使用二肽的估计倾向分数来识别和描述嗜热蛋白。
Sci Rep. 2021 Dec 10;11(1):23782. doi: 10.1038/s41598-021-03293-w.
7
Prediction of RNA-binding amino acids from protein and RNA sequences.从蛋白质和 RNA 序列预测 RNA 结合氨基酸。
BMC Bioinformatics. 2011;12 Suppl 13(Suppl 13):S7. doi: 10.1186/1471-2105-12-S13-S7. Epub 2011 Nov 30.
8
DeepTP: A Deep Learning Model for Thermophilic Protein Prediction.深度 TP:一种用于耐热蛋白预测的深度学习模型。
Int J Mol Sci. 2023 Jan 22;24(3):2217. doi: 10.3390/ijms24032217.
9
Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction.基于特征降维的嗜热蛋白与非嗜热蛋白鉴别
Front Bioeng Biotechnol. 2020 Oct 22;8:584807. doi: 10.3389/fbioe.2020.584807. eCollection 2020.
10
Identification of thermophilic proteins by incorporating evolutionary and acid dissociation information into Chou's general pseudo amino acid composition.通过将进化信息和酸解离信息纳入周的广义伪氨基酸组成来鉴定嗜热蛋白。
J Theor Biol. 2016 Oct 21;407:138-142. doi: 10.1016/j.jtbi.2016.07.010. Epub 2016 Jul 7.

引用本文的文献

1
Prediction and design of thermostable proteins with a desired melting temperature.具有所需解链温度的热稳定蛋白质的预测与设计。
Sci Rep. 2025 May 14;15(1):16683. doi: 10.1038/s41598-025-98667-9.
2
HPClas: A data-driven approach for identifying halophilic proteins based on catBoost.HPClas:一种基于CatBoost的数据驱动型嗜盐蛋白识别方法。
mLife. 2024 Jul 20;3(4):515-526. doi: 10.1002/mlf2.12125. eCollection 2024 Dec.
3
TemStaPro: protein thermostability prediction using sequence representations from protein language models.

本文引用的文献

1
RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule.RAACBook:一个基于简化氨基酸字母表的网络服务器,用于通过使用周保罗的五步法则进行序列相关推断。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz131.
2
A novel molecular representation with BiGRU neural networks for learning atom.用于学习原子的 BiGRU 神经网络的新型分子表示。
Brief Bioinform. 2020 Dec 1;21(6):2099-2111. doi: 10.1093/bib/bbz125.
3
Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods.
TemStaPro:使用蛋白质语言模型的序列表示进行蛋白质热稳定性预测。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae157.
4
Superior protein thermophilicity prediction with protein language model embeddings.利用蛋白质语言模型嵌入实现卓越的蛋白质嗜热性预测。
NAR Genom Bioinform. 2023 Oct 11;5(4):lqad087. doi: 10.1093/nargab/lqad087. eCollection 2023 Dec.
5
Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor.使用具有氨基酸组成描述符的机器学习算法鉴别嗜冷酶。
Front Microbiol. 2023 Feb 13;14:1130594. doi: 10.3389/fmicb.2023.1130594. eCollection 2023.
6
Thermo-L-Asparaginases: From the Role in the Viability of Thermophiles and Hyperthermophiles at High Temperatures to a Molecular Understanding of Their Thermoactivity and Thermostability.嗜热菌和超嗜热菌在高温下生存的关键——热稳定 L-天冬酰胺酶:从作用机制到分子水平的热活性和热稳定性研究
Int J Mol Sci. 2023 Jan 31;24(3):2674. doi: 10.3390/ijms24032674.
7
Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins.用于预测和分析嗜热蛋白的基于机器学习的预测器的实证比较与分析
EXCLI J. 2022 Mar 2;21:554-570. doi: 10.17179/excli2022-4723. eCollection 2022.
8
NMR Structure and Biophysical Characterization of Thermophilic Single-Stranded DNA Binding Protein from .热嗜单链 DNA 结合蛋白的 NMR 结构和生物物理特性研究。
Int J Mol Sci. 2022 Mar 13;23(6):3099. doi: 10.3390/ijms23063099.
9
iThermo: A Sequence-Based Model for Identifying Thermophilic Proteins Using a Multi-Feature Fusion Strategy.iThermo:一种基于序列的模型,用于使用多特征融合策略识别嗜热蛋白。
Front Microbiol. 2022 Feb 22;13:790063. doi: 10.3389/fmicb.2022.790063. eCollection 2022.
10
Immunoglobulin Classification Based on FC* and GC* Features.基于Fc*和Gc*特征的免疫球蛋白分类
Front Genet. 2022 Jan 24;12:827161. doi: 10.3389/fgene.2021.827161. eCollection 2021.
使用深度森林结合正无标记学习方法预测疾病相关的环状 RNA。
Brief Bioinform. 2020 Jul 15;21(4):1425-1436. doi: 10.1093/bib/bbz080.
4
SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting.SubMito-XGBoost:通过融合多种特征信息和极端梯度提升预测蛋白质亚线粒体定位。
Bioinformatics. 2020 Feb 15;36(4):1074-1081. doi: 10.1093/bioinformatics/btz734.
5
Prediction of CYP450 Enzyme-Substrate Selectivity Based on the Network-Based Label Space Division Method.基于网络标记空间划分方法预测 CYP450 酶底物选择性。
J Chem Inf Model. 2019 Nov 25;59(11):4577-4586. doi: 10.1021/acs.jcim.9b00749. Epub 2019 Oct 22.
6
iPromoter-2L2.0: Identifying Promoters and Their Types by Combining Smoothing Cutting Window Algorithm and Sequence-Based Features.iPromoter-2L2.0:结合平滑切割窗口算法和基于序列的特征识别启动子及其类型
Mol Ther Nucleic Acids. 2019 Dec 6;18:80-87. doi: 10.1016/j.omtn.2019.08.008. Epub 2019 Aug 14.
7
Correlation-based channel selection and regularized feature optimization for MI-based BCI.基于相关的通道选择和正则化特征优化用于基于 MI 的脑机接口。
Neural Netw. 2019 Oct;118:262-270. doi: 10.1016/j.neunet.2019.07.008. Epub 2019 Jul 15.
8
Deep Collaborative Filtering for Prediction of Disease Genes.深度协同过滤在疾病基因预测中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1639-1647. doi: 10.1109/TCBB.2019.2907536. Epub 2019 Mar 26.
9
Conserved Disease Modules Extracted From Multilayer Heterogeneous Disease and Gene Networks for Understanding Disease Mechanisms and Predicting Disease Treatments.从多层异质疾病和基因网络中提取的保守疾病模块,用于理解疾病机制和预测疾病治疗方法。
Front Genet. 2019 Jan 18;9:745. doi: 10.3389/fgene.2018.00745. eCollection 2018.
10
Identifying Plant Pentatricopeptide Repeat Coding Gene/Protein Using Mixed Feature Extraction Methods.使用混合特征提取方法鉴定植物五肽重复编码基因/蛋白质
Front Plant Sci. 2019 Jan 10;9:1961. doi: 10.3389/fpls.2018.01961. eCollection 2018.