• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HPClas:一种基于CatBoost的数据驱动型嗜盐蛋白识别方法。

HPClas: A data-driven approach for identifying halophilic proteins based on catBoost.

作者信息

Hu Shantong, Wang Xiaoyu, Wang Zhikang, Jiang Menghan, Wang Shihui, Wang Wenya, Song Jiangning, Zhang Guimin

机构信息

College of Life Science and Technology Beijing University of Chemical Technology Beijing China.

Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology Monash University Melbourne Victoria Australia.

出版信息

mLife. 2024 Jul 20;3(4):515-526. doi: 10.1002/mlf2.12125. eCollection 2024 Dec.

DOI:10.1002/mlf2.12125
PMID:39744092
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11685839/
Abstract

Halophilic proteins possess unique structural properties and show high stability under extreme conditions. This distinct characteristic makes them invaluable for application in various aspects such as bioenergy, pharmaceuticals, environmental clean-up, and energy production. Generally, halophilic proteins are discovered and characterized through labor-intensive and time-consuming wet lab experiments. In this study, we introduce the Halophilic Protein Classifier (HPClas), a machine learning-based classifier developed using the catBoost ensemble learning technique to identify halophilic proteins. Extensive in silico calculations were conducted on a large public dataset of 12,574 samples and HPClas achieved an area under the receiver operating characteristic curve (AUROC) of 0.844 on an independent test set of 200 samples. The source code and curated dataset of HPClas are publicly available at https://github.com/Showmake2/HPClas. In conclusion, HPClas can be explored as a promising tool to aid in the identification of halophilic proteins and accelerate their application in different fields.

摘要

嗜盐蛋白具有独特的结构特性,在极端条件下表现出高稳定性。这一独特特性使其在生物能源、制药、环境清理和能源生产等各个方面具有不可估量的应用价值。一般来说,嗜盐蛋白是通过劳动强度大且耗时的湿实验室实验来发现和表征的。在本研究中,我们介绍了嗜盐蛋白分类器(HPClas),这是一种基于机器学习的分类器,使用CatBoost集成学习技术开发,用于识别嗜盐蛋白。我们对一个包含12574个样本的大型公共数据集进行了广泛的计算机模拟计算,HPClas在一个由200个样本组成的独立测试集上的受试者工作特征曲线下面积(AUROC)达到了0.844。HPClas的源代码和整理后的数据集可在https://github.com/Showmake2/HPClas上公开获取。总之,HPClas可作为一种有前景的工具进行探索,以帮助识别嗜盐蛋白并加速其在不同领域的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/8e8394468411/MLF2-3-515-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/e1c042270eda/MLF2-3-515-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/fb953b2c734d/MLF2-3-515-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/cd123c7bc817/MLF2-3-515-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/b9f02c9d8f44/MLF2-3-515-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/8e8394468411/MLF2-3-515-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/e1c042270eda/MLF2-3-515-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/fb953b2c734d/MLF2-3-515-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/cd123c7bc817/MLF2-3-515-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/b9f02c9d8f44/MLF2-3-515-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/8e8394468411/MLF2-3-515-g004.jpg

相似文献

1
HPClas: A data-driven approach for identifying halophilic proteins based on catBoost.HPClas:一种基于CatBoost的数据驱动型嗜盐蛋白识别方法。
mLife. 2024 Jul 20;3(4):515-526. doi: 10.1002/mlf2.12125. eCollection 2024 Dec.
2
TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting.TPGPred:一种基于梯度提升的混合特征驱动方法,用于识别嗜热蛋白。
Int J Mol Sci. 2024 Nov 5;25(22):11866. doi: 10.3390/ijms252211866.
3
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
4
NeuroPpred-SVM: A New Model for Predicting Neuropeptides Based on Embeddings of BERT.基于 BERT 嵌入的神经肽预测新模型:NeuroPpred-SVM
J Proteome Res. 2023 Mar 3;22(3):718-728. doi: 10.1021/acs.jproteome.2c00363. Epub 2023 Feb 7.
5
Accurately identifying hemagglutinin using sequence information and machine learning methods.使用序列信息和机器学习方法准确识别血凝素。
Front Med (Lausanne). 2023 Oct 31;10:1281880. doi: 10.3389/fmed.2023.1281880. eCollection 2023.
6
Machine learning-based predictive models for perioperative major adverse cardiovascular events in patients with stable coronary artery disease undergoing noncardiac surgery.基于机器学习的预测模型用于接受非心脏手术的稳定冠状动脉疾病患者围手术期主要不良心血管事件的预测
Comput Methods Programs Biomed. 2025 Mar;260:108561. doi: 10.1016/j.cmpb.2024.108561. Epub 2024 Dec 13.
7
PmxPred: A data-driven approach for the identification of active polymyxin analogues against gram-negative bacteria.PmxPred:一种针对革兰氏阴性菌的活性黏菌素类似物鉴定的基于数据驱动的方法。
Comput Biol Med. 2024 Jan;168:107681. doi: 10.1016/j.compbiomed.2023.107681. Epub 2023 Nov 14.
8
Prediction of respiratory failure risk in patients with pneumonia in the ICU using ensemble learning models.使用集成学习模型预测 ICU 肺炎患者的呼吸衰竭风险。
PLoS One. 2023 Sep 21;18(9):e0291711. doi: 10.1371/journal.pone.0291711. eCollection 2023.
9
Glypred: Lysine Glycation Site Prediction via CCU-LightGBM-BiLSTM Framework with Multi-Head Attention Mechanism.Glypred:基于 CCU-LightGBM-BiLSTM 框架与多头注意力机制的赖氨酸糖基化位点预测
J Chem Inf Model. 2024 Aug 26;64(16):6699-6711. doi: 10.1021/acs.jcim.4c01034. Epub 2024 Aug 9.
10
[Prediction of intensive care unit readmission for critically ill patients based on ensemble learning].基于集成学习的危重症患者重症监护病房再入院预测
Beijing Da Xue Xue Bao Yi Xue Ban. 2021 Jun 18;53(3):566-572. doi: 10.19723/j.issn.1671-167X.2021.03.021.

本文引用的文献

1
A hybrid CNN-KNN approach for identification of COVID-19 with 5-fold cross validation.一种用于识别新冠肺炎的混合卷积神经网络-最近邻算法,并采用5折交叉验证。
Sens Int. 2023;4:100229. doi: 10.1016/j.sintl.2023.100229. Epub 2023 Jan 31.
2
An Optimal Approach for Heart Sound Classification Using Grid Search in Hyperparameter Optimization of Machine Learning.一种在机器学习超参数优化中使用网格搜索进行心音分类的优化方法。
Bioengineering (Basel). 2022 Dec 29;10(1):45. doi: 10.3390/bioengineering10010045.
3
ASPIRER: a new computational approach for identifying non-classical secreted proteins based on deep learning.
ASPIRER:一种基于深度学习的新计算方法,用于识别非经典分泌蛋白。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac031.
4
SignalP 6.0 predicts all five types of signal peptides using protein language models.SignalP 6.0 使用蛋白质语言模型预测所有五种类型的信号肽。
Nat Biotechnol. 2022 Jul;40(7):1023-1025. doi: 10.1038/s41587-021-01156-3. Epub 2022 Jan 3.
5
iLearnPlus: a comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization.iLearnPlus:一个全面的、自动化的机器学习平台,用于核酸和蛋白质序列分析、预测和可视化。
Nucleic Acids Res. 2021 Jun 4;49(10):e60. doi: 10.1093/nar/gkab122.
6
Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。
Nucleic Acids Res. 2021 Jan 8;49(D1):D10-D17. doi: 10.1093/nar/gkaa892.
7
A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features.一种基于简化氨基酸和混合特征的嗜热蛋白预测方法。
Front Bioeng Biotechnol. 2020 May 5;8:285. doi: 10.3389/fbioe.2020.00285. eCollection 2020.
8
Chryseobacterium salivictor sp. nov., a plant-growth-promoting bacterium isolated from freshwater.食盐水菌属(Chryseobacterium)新种,一种从淡水中分离得到的具有植物促生作用的细菌。
Antonie Van Leeuwenhoek. 2020 Jul;113(7):989-995. doi: 10.1007/s10482-020-01411-8. Epub 2020 Apr 15.
9
iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data.iLearn:一个集成平台和元学习者,用于 DNA、RNA 和蛋白质序列数据的特征工程、机器学习分析和建模。
Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041.
10
XGBoost Model for Chronic Kidney Disease Diagnosis.XGBoost 模型用于慢性肾脏病诊断。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):2131-2140. doi: 10.1109/TCBB.2019.2911071. Epub 2020 Dec 8.