Escuela de Ciencias Biológicas e Ingeniería, Universidad de Investigación de Tecnología Experimental Yachay, Urcuquí, Imbabura, 100115, Ecuador.
F1000Res. 2022 Feb 9;11:164. doi: 10.12688/f1000research.107925.1. eCollection 2022.
Atmospheric nitrogen fixation carried out by microorganisms has environmental and industrial importance, related to the increase of soil fertility and productivity. The present work proposes the development of a new high precision system that allows the recognition of amino acid sequences of the nitrogenase enzyme (NifH) as a promising way to improve the identification of diazotrophic bacteria. For this purpose, a database obtained from UniProt built a processed dataset formed by a set of 4911 and 4782 amino acid sequences of the NifH and non-NifH proteins respectively. Subsequently, the feature extraction was developed using two methodologies: (i) k-mers counting and (ii) embedding layers to obtain numerical vectors of the amino acid chains. Afterward, for the embedding layer, the data was crossed by an external trainable convolutional layer, which received a uniform matrix and applied convolution using filters to obtain the feature maps of the model. Finally, a deep neural network was used as the primary model to classify the amino acid sequences as NifH protein or not. Performance evaluation experiments were carried out, and the results revealed an accuracy of 96.4%, a sensitivity of 95.2%, and a specificity of 96.7%. Therefore, an amino acid sequence-based feature extraction method that uses a neural network to detect N-fixing organisms is proposed and implemented. NIFtHool is available from: https://nifthool.anvil.app/.
大气中的氮固定由微生物完成,具有环境和工业重要性,与土壤肥力和生产力的提高有关。本工作提出开发一种新的高精度系统,该系统可以识别固氮酶(NifH)的氨基酸序列,作为提高固氮细菌识别的有前途的方法。为此,从 UniProt 获得的数据库构建了一个经过处理的数据集,该数据集由分别为 4911 和 4782 个 NifH 和非 NifH 蛋白的氨基酸序列组成。随后,使用两种方法(i)k-mer 计数和(ii)嵌入层来开发特征提取,以分别获得氨基酸链的数字向量。此后,对于嵌入层,通过外部可训练的卷积层来交叉数据,该卷积层接收一个均匀矩阵并使用滤波器进行卷积,以获得模型的特征图。最后,使用深度神经网络作为主要模型对氨基酸序列进行分类,以确定它们是否为 NifH 蛋白。进行了性能评估实验,结果表明准确率为 96.4%,灵敏度为 95.2%,特异性为 96.7%。因此,提出并实现了一种基于氨基酸序列的特征提取方法,该方法使用神经网络来检测固氮生物。NIFtHool 可从:https://nifthool.anvil.app/ 获取。