Suppr超能文献

HPClas:一种基于CatBoost的数据驱动型嗜盐蛋白识别方法。

HPClas: A data-driven approach for identifying halophilic proteins based on catBoost.

作者信息

Hu Shantong, Wang Xiaoyu, Wang Zhikang, Jiang Menghan, Wang Shihui, Wang Wenya, Song Jiangning, Zhang Guimin

机构信息

College of Life Science and Technology Beijing University of Chemical Technology Beijing China.

Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology Monash University Melbourne Victoria Australia.

出版信息

mLife. 2024 Jul 20;3(4):515-526. doi: 10.1002/mlf2.12125. eCollection 2024 Dec.

Abstract

Halophilic proteins possess unique structural properties and show high stability under extreme conditions. This distinct characteristic makes them invaluable for application in various aspects such as bioenergy, pharmaceuticals, environmental clean-up, and energy production. Generally, halophilic proteins are discovered and characterized through labor-intensive and time-consuming wet lab experiments. In this study, we introduce the Halophilic Protein Classifier (HPClas), a machine learning-based classifier developed using the catBoost ensemble learning technique to identify halophilic proteins. Extensive in silico calculations were conducted on a large public dataset of 12,574 samples and HPClas achieved an area under the receiver operating characteristic curve (AUROC) of 0.844 on an independent test set of 200 samples. The source code and curated dataset of HPClas are publicly available at https://github.com/Showmake2/HPClas. In conclusion, HPClas can be explored as a promising tool to aid in the identification of halophilic proteins and accelerate their application in different fields.

摘要

嗜盐蛋白具有独特的结构特性,在极端条件下表现出高稳定性。这一独特特性使其在生物能源、制药、环境清理和能源生产等各个方面具有不可估量的应用价值。一般来说,嗜盐蛋白是通过劳动强度大且耗时的湿实验室实验来发现和表征的。在本研究中,我们介绍了嗜盐蛋白分类器(HPClas),这是一种基于机器学习的分类器,使用CatBoost集成学习技术开发,用于识别嗜盐蛋白。我们对一个包含12574个样本的大型公共数据集进行了广泛的计算机模拟计算,HPClas在一个由200个样本组成的独立测试集上的受试者工作特征曲线下面积(AUROC)达到了0.844。HPClas的源代码和整理后的数据集可在https://github.com/Showmake2/HPClas上公开获取。总之,HPClas可作为一种有前景的工具进行探索,以帮助识别嗜盐蛋白并加速其在不同领域的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/340d/11685839/e1c042270eda/MLF2-3-515-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验