Department of Electronics Engineering, Universidad Técnica Federico Santa María, Valparaiso 2390123, Chile.
Laboratorio de Microbiología Molecular y Biotecnología Ambiental, Departamento de Química & Centro de Biotecnología Daniel Alkalay Lowitt, Universidad Técnica Federico Santa María, Valparaiso 2390123, Chile.
Genes (Basel). 2022 Jun 23;13(7):1126. doi: 10.3390/genes13071126.
Promoter identification is a fundamental step in understanding bacterial gene regulation mechanisms. However, accurate and fast classification of bacterial promoters continues to be challenging. New methods based on deep convolutional networks have been applied to identify and classify bacterial promoters recognized by sigma (σ) factors and RNA polymerase subunits which increase affinity to specific DNA sequences to modulate transcription and respond to nutritional or environmental changes. This work presents a new multiclass promoter prediction model by using convolutional neural networks (CNNs), denoted as PromoterLCNN, which classifies promoters into subclasses σ70, σ24, σ32, σ38, σ28, and σ54. We present a light, fast, and simple two-stage multiclass CNN architecture for promoter identification and classification. Training and testing were performed on a benchmark dataset, part of RegulonDB. Comparative performance of PromoterLCNN against other CNN-based classifiers using four parameters (Acc, Sn, Sp, MCC) resulted in similar or better performance than those that commonly use cascade architecture, reducing time by approximately 30-90% for training, prediction, and hyperparameter optimization without compromising classification quality.
启动子识别是理解细菌基因调控机制的基本步骤。然而,准确和快速地分类细菌启动子仍然具有挑战性。新的基于深度卷积网络的方法已被应用于识别和分类由σ(σ)因子和 RNA 聚合酶亚基识别的细菌启动子,这些因子和亚基增加了与特定 DNA 序列的亲和力,从而调节转录并响应营养或环境变化。这项工作提出了一种新的多类启动子预测模型,该模型使用卷积神经网络(CNN)表示为 PromoterLCNN,将启动子分为子类 σ70、σ24、σ32、σ38、σ28 和 σ54。我们提出了一种轻量级、快速和简单的两级多类 CNN 架构,用于启动子识别和分类。在 RegulonDB 的基准数据集上进行了训练和测试。与其他基于 CNN 的分类器相比,使用四个参数(Acc、Sn、Sp、MCC)对 PromoterLCNN 的性能进行了比较,其性能与通常使用级联架构的分类器相似或更好,同时减少了大约 30-90%的训练、预测和超参数优化时间,而不会影响分类质量。