• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TPGPred:一种基于梯度提升的混合特征驱动方法,用于识别嗜热蛋白。

TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting.

机构信息

Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.

Institute of Public Safety Research, Department of Engineering Physics, Tsinghua University, Beijing 100084, China.

出版信息

Int J Mol Sci. 2024 Nov 5;25(22):11866. doi: 10.3390/ijms252211866.

DOI:10.3390/ijms252211866
PMID:39595936
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11594102/
Abstract

Thermophilic proteins maintain their stability and functionality under extreme high-temperature conditions, making them of significant importance in both fundamental biological research and biotechnological applications. In this study, we developed a machine learning-based thermophilic protein GradientBoosting prediction model, TPGPred, designed to predict thermophilic proteins by leveraging a large-scale dataset of both thermophilic and non-thermophilic protein sequences. By combining various machine learning algorithms with feature-engineering methods, we systematically evaluated the classification performance of the model, identifying the optimal feature combinations and classification models. Trained on a large public dataset of 5652 samples, TPGPred achieved an Accuracy score greater than 0.95 and an Area Under the Receiver Operating Characteristic Curve (AUROC) score greater than 0.98 on an independent test set of 627 samples. Our findings offer new insights into the identification and classification of thermophilic proteins and provide a solid foundation for their industrial application development.

摘要

嗜热蛋白在极端高温条件下保持其稳定性和功能性,因此它们在基础生物学研究和生物技术应用中都具有重要意义。在这项研究中,我们开发了一种基于机器学习的嗜热蛋白 GradientBoosting 预测模型 TPGPred,旨在通过利用大规模的嗜热和非嗜热蛋白序列数据集来预测嗜热蛋白。通过将各种机器学习算法与特征工程方法相结合,我们系统地评估了模型的分类性能,确定了最佳的特征组合和分类模型。在一个包含 5652 个样本的大型公共数据集上进行训练后,TPGPred 在一个包含 627 个样本的独立测试集上的准确率大于 0.95,接收器操作特征曲线下的面积(AUROC)大于 0.98。我们的研究结果为嗜热蛋白的鉴定和分类提供了新的见解,并为其工业应用开发奠定了坚实的基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/4b68922f366e/ijms-25-11866-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/acc03513156f/ijms-25-11866-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/51d4251e3ab6/ijms-25-11866-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/6cc715db53f4/ijms-25-11866-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/40ee1216e296/ijms-25-11866-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/7d344aa53d24/ijms-25-11866-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/d97b2ad545be/ijms-25-11866-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/3e8fa2ebe42a/ijms-25-11866-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/9efd56e0fc79/ijms-25-11866-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/86f4ee2bc192/ijms-25-11866-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/52998a0f506a/ijms-25-11866-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/4b64fbb61aec/ijms-25-11866-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/4b68922f366e/ijms-25-11866-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/acc03513156f/ijms-25-11866-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/51d4251e3ab6/ijms-25-11866-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/6cc715db53f4/ijms-25-11866-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/40ee1216e296/ijms-25-11866-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/7d344aa53d24/ijms-25-11866-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/d97b2ad545be/ijms-25-11866-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/3e8fa2ebe42a/ijms-25-11866-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/9efd56e0fc79/ijms-25-11866-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/86f4ee2bc192/ijms-25-11866-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/52998a0f506a/ijms-25-11866-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/4b64fbb61aec/ijms-25-11866-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc30/11594102/4b68922f366e/ijms-25-11866-g012.jpg

相似文献

1
TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting.TPGPred:一种基于梯度提升的混合特征驱动方法,用于识别嗜热蛋白。
Int J Mol Sci. 2024 Nov 5;25(22):11866. doi: 10.3390/ijms252211866.
2
PTSP-BERT: Predict the thermal stability of proteins using sequence-based bidirectional representations from transformer-embedded features.PTSP-BERT:利用来自Transformer嵌入特征的基于序列的双向表示预测蛋白质的热稳定性。
Comput Biol Med. 2025 Feb;185:109598. doi: 10.1016/j.compbiomed.2024.109598. Epub 2024 Dec 20.
3
A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides.一种新的基于序列的预测器,用于使用二肽的估计倾向分数来识别和描述嗜热蛋白。
Sci Rep. 2021 Dec 10;11(1):23782. doi: 10.1038/s41598-021-03293-w.
4
ThermoFinder: A sequence-based thermophilic proteins prediction framework.ThermoFinder:一个基于序列的嗜热蛋白预测框架。
Int J Biol Macromol. 2024 Jun;270(Pt 2):132469. doi: 10.1016/j.ijbiomac.2024.132469. Epub 2024 May 16.
5
Boosting phosphorylation site prediction with sequence feature-based machine learning.基于序列特征的机器学习提高磷酸化位点预测。
Proteins. 2020 Feb;88(2):284-291. doi: 10.1002/prot.25801. Epub 2019 Aug 22.
6
SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins.SAPPHIRE:一种基于堆叠的集成学习框架,用于准确预测嗜热蛋白。
Comput Biol Med. 2022 Jul;146:105704. doi: 10.1016/j.compbiomed.2022.105704. Epub 2022 Jun 7.
7
Establishment of noninvasive diabetes risk prediction model based on tongue features and machine learning techniques.基于舌象特征和机器学习技术的无创糖尿病风险预测模型的建立。
Int J Med Inform. 2021 May;149:104429. doi: 10.1016/j.ijmedinf.2021.104429. Epub 2021 Feb 22.
8
DeepTP: A Deep Learning Model for Thermophilic Protein Prediction.深度 TP:一种用于耐热蛋白预测的深度学习模型。
Int J Mol Sci. 2023 Jan 22;24(3):2217. doi: 10.3390/ijms24032217.
9
Machine learning algorithms for predicting COVID-19 mortality in Ethiopia.用于预测埃塞俄比亚 COVID-19 死亡率的机器学习算法。
BMC Public Health. 2024 Jun 28;24(1):1728. doi: 10.1186/s12889-024-19196-0.
10
Efficacy of automated machine learning models and feature engineering for diagnosis of equivocal appendicitis using clinical and computed tomography findings.自动化机器学习模型和特征工程在使用临床和计算机断层扫描结果诊断疑似阑尾炎中的效果。
Sci Rep. 2024 Sep 30;14(1):22658. doi: 10.1038/s41598-024-72889-9.