• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用蛋白质语言模型嵌入和逻辑回归进行高效准确的嗜酸性蛋白质计算分类。

Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification.

机构信息

Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas Pertamina, School of Computer Science, Jl Teuku Nyak Arief Jakarta Selatan DKI, Jakarta, Indonesia.

Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas UniversalKompleks Maha Vihara Duta Maitreya Bukit Beruntung, Sei Panas Batam, Kepulauan, Riau 29456, Indonesia.

出版信息

Comput Biol Chem. 2024 Oct;112:108163. doi: 10.1016/j.compbiolchem.2024.108163. Epub 2024 Jul 26.

DOI:10.1016/j.compbiolchem.2024.108163
PMID:39098138
Abstract

The increasing demand for eco-friendly technologies in biotechnology necessitates effective and sustainable catalysts. Acidophilic proteins, functioning optimally in highly acidic environments, hold immense promise for various applications, including food production, biofuels, and bioremediation. However, limited knowledge about these proteins hinders their exploration. This study addresses this gap by employing in silico methods utilizing computational tools and machine learning. We propose a novel approach to predict acidophilic proteins using protein language models (PLMs), accelerating discovery without extensive lab work. Our investigation highlights the potential of PLMs in understanding and harnessing acidophilic proteins for scientific and industrial advancements. We introduce the ACE model, which combines a simple Logistic Regression model with embeddings derived from protein sequences processed by the ProtT5 PLM. This model achieves high performance on an independent test set, with accuracy (0.91), F1-score (0.93), and Matthew's correlation coefficient (0.76). To our knowledge, this is the first application of pre-trained PLM embeddings for acidophilic protein classification. The ACE model serves as a powerful tool for exploring protein acidophilicity, paving the way for future advancements in protein design and engineering.

摘要

生物技术中对环保技术的需求不断增加,这就需要高效且可持续的催化剂。在强酸环境中能最佳发挥作用的嗜酸蛋白在食品生产、生物燃料和生物修复等各种应用中具有巨大的潜力。然而,我们对这些蛋白质的了解有限,这阻碍了它们的开发。本研究通过使用计算工具和机器学习的计算方法来解决这一差距。我们提出了一种使用蛋白质语言模型 (PLM) 来预测嗜酸蛋白的新方法,无需进行广泛的实验室工作即可加速发现。我们的研究强调了 PLM 在理解和利用嗜酸蛋白以促进科学和工业进步方面的潜力。我们引入了 ACE 模型,该模型将简单的逻辑回归模型与 ProtT5 PLM 处理的蛋白质序列的嵌入相结合。该模型在独立测试集上取得了很高的性能,准确率为 0.91,F1 得分为 0.93,马修斯相关系数为 0.76。据我们所知,这是首次将预训练的 PLM 嵌入应用于嗜酸蛋白分类。ACE 模型是探索蛋白质嗜酸特性的有力工具,为蛋白质设计和工程的未来发展铺平了道路。

相似文献

1
Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification.利用蛋白质语言模型嵌入和逻辑回归进行高效准确的嗜酸性蛋白质计算分类。
Comput Biol Chem. 2024 Oct;112:108163. doi: 10.1016/j.compbiolchem.2024.108163. Epub 2024 Jul 26.
2
Classifying alkaliphilic proteins using embeddings from protein language model.使用蛋白质语言模型的嵌入来对嗜堿蛋白进行分类。
Comput Biol Med. 2024 May;173:108385. doi: 10.1016/j.compbiomed.2024.108385. Epub 2024 Mar 26.
3
SumoPred-PLM: human SUMOylation and SUMO2/3 sites Prediction using Pre-trained Protein Language Model.SumoPred-PLM:使用预训练蛋白质语言模型预测人类SUMO化和SUMO2/3位点
NAR Genom Bioinform. 2024 Feb 7;6(1):lqae011. doi: 10.1093/nargab/lqae011. eCollection 2024 Mar.
4
Assessing the role of evolutionary information for enhancing protein language model embeddings.评估进化信息在增强蛋白质语言模型嵌入中的作用。
Sci Rep. 2024 Sep 5;14(1):20692. doi: 10.1038/s41598-024-71783-8.
5
LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model.LMCrot:一种基于转换器的蛋白质语言模型的可解释窗口级嵌入的增强型蛋白质巴豆酰化位点预测器。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae290.
6
Enhancing ECG signal classification through pre-trained stacked-CNN embeddings: a transfer learning approach.通过预训练的堆叠卷积神经网络嵌入来增强心电图信号分类:一种迁移学习方法。
Biomed Phys Eng Express. 2024 May 9;10(4). doi: 10.1088/2057-1976/ad40b0.
7
Embeddings from protein language models predict conservation and variant effects.基于蛋白质语言模型的嵌入模型可预测保守性和变异效应。
Hum Genet. 2022 Oct;141(10):1629-1647. doi: 10.1007/s00439-021-02411-y. Epub 2021 Dec 30.
8
NCSP-PLM: An ensemble learning framework for predicting non-classical secreted proteins based on protein language models and deep learning.NCSP-PLM:基于蛋白质语言模型和深度学习的非经典分泌蛋白预测的集成学习框架。
Math Biosci Eng. 2024 Jan;21(1):1472-1488. doi: 10.3934/mbe.2024063. Epub 2022 Dec 28.
9
Novel machine learning approaches revolutionize protein knowledge.新型机器学习方法彻底改变了蛋白质知识。
Trends Biochem Sci. 2023 Apr;48(4):345-359. doi: 10.1016/j.tibs.2022.11.001. Epub 2022 Dec 9.
10
PLM_Sol: predicting protein solubility by benchmarking multiple protein language models with the updated Escherichia coli protein solubility dataset.PLM_Sol:通过使用更新的大肠杆菌蛋白质可溶性数据集对多个蛋白质语言模型进行基准测试来预测蛋白质可溶性。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae404.