NFEmbed：通过使用预训练蛋白质嵌入进行分类和回归来模拟固氮酶活性。

NFEmbed: modeling nitrogenase activity via classification and regression with pretrained protein embeddings.

作者信息

Nafi Md Muhaiminul Islam, Mohaimin Abdullah Al

机构信息

Department of CSE, BUET, Dhaka 1000, Bangladesh.

Department of CSE, United International University (UIU), Dhaka 1212, Bangladesh.

出版信息

Bioinform Adv. 2025 Aug 23;5(1):vbaf204. doi: 10.1093/bioadv/vbaf204. eCollection 2025.

DOI:10.1093/bioadv/vbaf204

PMID:40926956

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12417089/

Abstract

MOTIVATION

Heavy usage of synthetic nitrogen fertilizers to satisfy the increasing demands for food has led to severe environmental impacts like decreasing crop yields and eutrophication. One promising alternative is using nitrogen-fixing microorganisms as biofertilizers, which use the nitrogenase enzyme. This could also be achieved by expressing a functional nitrogenase enzyme in the cells of the cereal crops.

RESULTS

In this study, we predicted microbial strains with a high potential for nitrogenase activity using machine learning techniques. Its objective was to enable the screening and ranking of potential strains based on genomic information. We explored several protein language model embeddings for this prediction task and built two stacking ensemble models. One of them, NFEmbed-C, used k-Nearest Neighbors and Random Forest as base and meta learners, respectively. The other one, NFEmbed-R, combined Decision Tree Regressor and eXtreme Gradient Boosting Regressor as base learners, with Support Vector Regressor as the meta learner. On the Test set, both NFEmbed-C and NFEmbed-R performed better than the state-of-the-art methods with improvements ranging from 0% to 11.2% and from 30% to 51%, respectively. While NFEmbed-R got a 0.783 score, 0.158 MSE, and 0.398 RMSE, NFEmbed-C acquired 0.949 sensitivity, 0.892 F1 score, and 0.784 Matthews Correlation Coefficient on the test set.

AVAILABILITY AND IMPLEMENTATION

We performed our analysis in Python; code is available at https://github.com/nafcoder/NFEmbed.

摘要

动机

大量使用合成氮肥以满足不断增长的粮食需求，已导致诸如作物产量下降和富营养化等严重环境影响。一种有前景的替代方法是使用固氮微生物作为生物肥料，这些微生物利用固氮酶。这也可以通过在谷类作物细胞中表达功能性固氮酶来实现。

结果

在本研究中，我们使用机器学习技术预测具有高固氮酶活性潜力的微生物菌株。其目的是基于基因组信息对潜在菌株进行筛选和排名。我们针对此预测任务探索了几种蛋白质语言模型嵌入，并构建了两个堆叠集成模型。其中一个，NFEmbed-C，分别使用k近邻和随机森林作为基学习器和元学习器。另一个，NFEmbed-R，将决策树回归器和极端梯度提升回归器组合作为基学习器，支持向量回归器作为元学习器。在测试集上，NFEmbed-C和NFEmbed-R的表现均优于现有方法，改进幅度分别为0%至11.2%和30%至51%。NFEmbed-R在测试集上的得分为0.783、均方误差为(0.158)、均方根误差为(0.398)，而NFEmbed-C在测试集上的灵敏度为(0.949)、F1分数为(0.892)、马修斯相关系数为(0.784)。

可用性与实现

我们用Python进行了分析；代码可在https://github.com/nafcoder/NFEmbed获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

NFEmbed：通过使用预训练蛋白质嵌入进行分类和回归来模拟固氮酶活性。

NFEmbed: modeling nitrogenase activity via classification and regression with pretrained protein embeddings.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性与实现

相似文献

本文引用的文献

NFEmbed：通过使用预训练蛋白质嵌入进行分类和回归来模拟固氮酶活性。

NFEmbed: modeling nitrogenase activity via classification and regression with pretrained protein embeddings.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性与实现

相似文献

本文引用的文献