• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NFEmbed:通过使用预训练蛋白质嵌入进行分类和回归来模拟固氮酶活性。

NFEmbed: modeling nitrogenase activity via classification and regression with pretrained protein embeddings.

作者信息

Nafi Md Muhaiminul Islam, Mohaimin Abdullah Al

机构信息

Department of CSE, BUET, Dhaka 1000, Bangladesh.

Department of CSE, United International University (UIU), Dhaka 1212, Bangladesh.

出版信息

Bioinform Adv. 2025 Aug 23;5(1):vbaf204. doi: 10.1093/bioadv/vbaf204. eCollection 2025.

DOI:10.1093/bioadv/vbaf204
PMID:40926956
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12417089/
Abstract

MOTIVATION

Heavy usage of synthetic nitrogen fertilizers to satisfy the increasing demands for food has led to severe environmental impacts like decreasing crop yields and eutrophication. One promising alternative is using nitrogen-fixing microorganisms as biofertilizers, which use the nitrogenase enzyme. This could also be achieved by expressing a functional nitrogenase enzyme in the cells of the cereal crops.

RESULTS

In this study, we predicted microbial strains with a high potential for nitrogenase activity using machine learning techniques. Its objective was to enable the screening and ranking of potential strains based on genomic information. We explored several protein language model embeddings for this prediction task and built two stacking ensemble models. One of them, NFEmbed-C, used k-Nearest Neighbors and Random Forest as base and meta learners, respectively. The other one, NFEmbed-R, combined Decision Tree Regressor and eXtreme Gradient Boosting Regressor as base learners, with Support Vector Regressor as the meta learner. On the Test set, both NFEmbed-C and NFEmbed-R performed better than the state-of-the-art methods with improvements ranging from 0% to 11.2% and from 30% to 51%, respectively. While NFEmbed-R got a 0.783 score, 0.158 MSE, and 0.398 RMSE, NFEmbed-C acquired 0.949 sensitivity, 0.892 F1 score, and 0.784 Matthews Correlation Coefficient on the test set.

AVAILABILITY AND IMPLEMENTATION

We performed our analysis in Python; code is available at https://github.com/nafcoder/NFEmbed.

摘要

动机

大量使用合成氮肥以满足不断增长的粮食需求,已导致诸如作物产量下降和富营养化等严重环境影响。一种有前景的替代方法是使用固氮微生物作为生物肥料,这些微生物利用固氮酶。这也可以通过在谷类作物细胞中表达功能性固氮酶来实现。

结果

在本研究中,我们使用机器学习技术预测具有高固氮酶活性潜力的微生物菌株。其目的是基于基因组信息对潜在菌株进行筛选和排名。我们针对此预测任务探索了几种蛋白质语言模型嵌入,并构建了两个堆叠集成模型。其中一个,NFEmbed-C,分别使用k近邻和随机森林作为基学习器和元学习器。另一个,NFEmbed-R,将决策树回归器和极端梯度提升回归器组合作为基学习器,支持向量回归器作为元学习器。在测试集上,NFEmbed-C和NFEmbed-R的表现均优于现有方法,改进幅度分别为0%至11.2%和30%至51%。NFEmbed-R在测试集上的得分为0.783、均方误差为(0.158)、均方根误差为(0.398),而NFEmbed-C在测试集上的灵敏度为(0.949)、F1分数为(0.892)、马修斯相关系数为(0.784)。

可用性与实现

我们用Python进行了分析;代码可在https://github.com/nafcoder/NFEmbed获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/99a1d53a214e/vbaf204f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/5cd543836f30/vbaf204f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/42c94be06a95/vbaf204f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/b4cc5fde8da0/vbaf204f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/e7902266332a/vbaf204f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/57202d12ac9c/vbaf204f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/6861dbb40781/vbaf204f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/6c74b927ebf7/vbaf204f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/1a81b50f178a/vbaf204f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/99a1d53a214e/vbaf204f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/5cd543836f30/vbaf204f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/42c94be06a95/vbaf204f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/b4cc5fde8da0/vbaf204f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/e7902266332a/vbaf204f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/57202d12ac9c/vbaf204f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/6861dbb40781/vbaf204f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/6c74b927ebf7/vbaf204f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/1a81b50f178a/vbaf204f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac6/12417089/99a1d53a214e/vbaf204f9.jpg

相似文献

1
NFEmbed: modeling nitrogenase activity via classification and regression with pretrained protein embeddings.NFEmbed:通过使用预训练蛋白质嵌入进行分类和回归来模拟固氮酶活性。
Bioinform Adv. 2025 Aug 23;5(1):vbaf204. doi: 10.1093/bioadv/vbaf204. eCollection 2025.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
5
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型:基于多中心队列研究的开发与验证研究
J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
7
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
9
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
10
Development and validation of a machine learning-based model for predicting intraoperative blood loss during burn surgery.基于机器学习的烧伤手术术中失血量预测模型的开发与验证
Surgery. 2025 Aug;184:109445. doi: 10.1016/j.surg.2025.109445. Epub 2025 May 29.

本文引用的文献

1
Carmna: classification and regression models for nitrogenase activity based on a pretrained large protein language model.卡尔姆纳:基于预训练大型蛋白质语言模型的固氮酶活性分类与回归模型
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf197.
2
pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network.pNPs-CapsNet:使用蛋白质语言模型和基于FastText编码的加权多视图特征集成与深度胶囊神经网络预测神经肽
ACS Omega. 2025 Mar 18;10(12):12403-12416. doi: 10.1021/acsomega.4c11449. eCollection 2025 Apr 1.
3
Molecular sorting of nitrogenase catalytic cofactors.
固氮酶催化辅因子的分子分选
J Biol Chem. 2025 Mar;301(3):108291. doi: 10.1016/j.jbc.2025.108291. Epub 2025 Feb 10.
4
TargetCLP: clathrin proteins prediction combining transformed and evolutionary scale modeling-based multi-view features via weighted feature integration approach.TargetCLP:通过加权特征整合方法结合基于变换和进化尺度建模的多视图特征进行网格蛋白蛋白质预测。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf026.
5
Simulating 500 million years of evolution with a language model.用语言模型模拟5亿年的进化历程。
Science. 2025 Feb 21;387(6736):850-858. doi: 10.1126/science.ads0018. Epub 2025 Jan 16.
6
pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning.pACP-HybDeep:基于二叉树生长的变压器和深度混合学习的结构特征编码预测抗癌肽
Sci Rep. 2025 Jan 2;15(1):565. doi: 10.1038/s41598-024-84146-0.
7
DeepAIPs-Pred: Predicting Anti-Inflammatory Peptides Using Local Evolutionary Transformation Images and Structural Embedding-Based Optimal Descriptors with Self-Normalized BiTCNs.深度人工智能预测:使用局部进化变换图像和基于结构嵌入的最优描述符与自归一化双向卷积网络预测抗炎肽
J Chem Inf Model. 2024 Dec 23;64(24):9609-9625. doi: 10.1021/acs.jcim.4c01758. Epub 2024 Dec 3.
8
GraphPBSP: Protein binding site prediction based on Graph Attention Network and pre-trained model ProstT5.GraphPBSP:基于图注意力网络和预训练模型ProstT5的蛋白质结合位点预测
Int J Biol Macromol. 2024 Dec;282(Pt 1):136933. doi: 10.1016/j.ijbiomac.2024.136933. Epub 2024 Oct 28.
9
A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications.基于残基的混合序贯编码机制与 XGBoost 改进的集成模型用于识别 5-羟甲基胞嘧啶修饰。
Sci Rep. 2024 Sep 6;14(1):20819. doi: 10.1038/s41598-024-71568-z.
10
Euclidean-Distance-Preserved Feature Reduction for efficient person re-identification.基于欧几里得距离保特征降维的高效行人再识别
Neural Netw. 2024 Dec;180:106572. doi: 10.1016/j.neunet.2024.106572. Epub 2024 Aug 8.