Hu Wenxing, Li Yelin, Wu Yan, Guan Lixin, Li Mengshan
College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China.
iScience. 2024 May 19;27(6):110030. doi: 10.1016/j.isci.2024.110030. eCollection 2024 Jun 21.
Enhancers, genomic DNA elements, regulate neighboring gene expression crucial for biological processes like cell differentiation and stress response. However, current machine learning methods for predicting DNA enhancers often underutilize hidden features in gene sequences, limiting model accuracy. Hence, this article proposes the PDCNN model, a deep learning-based enhancer prediction method. PDCNN extracts statistical nucleotide representations from gene sequences, discerning positional distribution information of nucleotides in modifier-like DNA sequences. With a convolutional neural network structure, PDCNN employs dual convolutional and fully connected layers. The cross-entropy loss function iteratively updates using a gradient descent algorithm, enhancing prediction accuracy. Model parameters are fine-tuned to select optimal combinations for training, achieving over 95% accuracy. Comparative analysis with traditional methods and existing models demonstrates PDCNN's robust feature extraction capability. It outperforms advanced machine learning methods in identifying DNA enhancers, presenting an effective method with broad implications for genomics, biology, and medical research.
增强子作为基因组DNA元件,可调节邻近基因的表达,这对细胞分化和应激反应等生物学过程至关重要。然而,当前用于预测DNA增强子的机器学习方法常常未充分利用基因序列中的隐藏特征,从而限制了模型的准确性。因此,本文提出了PDCNN模型,这是一种基于深度学习的增强子预测方法。PDCNN从基因序列中提取统计核苷酸表征,识别类似修饰物的DNA序列中核苷酸的位置分布信息。借助卷积神经网络结构,PDCNN采用双重卷积层和全连接层。交叉熵损失函数使用梯度下降算法进行迭代更新,提高了预测准确性。对模型参数进行微调以选择训练用的最佳组合,实现了超过95%的准确率。与传统方法和现有模型的对比分析表明了PDCNN强大的特征提取能力。在识别DNA增强子时,它优于先进的机器学习方法,为基因组学、生物学和医学研究提供了一种具有广泛意义的有效方法。