Li HongFei, Zhang Jingyu, Zhao Yuming, Yang Wen
College of Life Science, Northeast Forestry University, Harbin, China.
College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
Front Microbiol. 2023 Mar 2;14:1141227. doi: 10.3389/fmicb.2023.1141227. eCollection 2023.
The promoter is an important noncoding DNA regulatory element, which combines with RNA polymerase to activate the expression of downstream genes. In industry, artificial arginine is mainly synthesized by . Replication of specific promoter regions can increase arginine production. Therefore, it is necessary to accurately locate the promoter in . In the wet experiment, promoter identification depends on sigma factors and DNA splicing technology, this is a laborious job. To quickly and conveniently identify the promoters in , we have developed a method based on novel feature representation and feature selection to complete this task, describing the DNA sequences through statistical parameters of multiple physicochemical properties, filtering redundant features by combining analysis of variance and hierarchical clustering, the prediction accuracy of the which is as high as 91.6%, the sensitivity of 91.9% can effectively identify promoters, and the specificity of 91.2% can accurately identify non-promoters. In addition, our model can correctly identify 181 promoters and 174 non-promoters among 400 independent samples, which proves that the developed prediction model has excellent robustness.
启动子是一种重要的非编码DNA调控元件,它与RNA聚合酶结合以激活下游基因的表达。在工业上,人工合成精氨酸主要是通过……特定启动子区域的复制可以提高精氨酸的产量。因此,有必要在……中准确地定位启动子。在湿实验中,启动子的识别依赖于σ因子和DNA拼接技术,这是一项费力的工作。为了快速、方便地识别……中的启动子,我们开发了一种基于新颖特征表示和特征选择的方法来完成这项任务,通过多种物理化学性质的统计参数来描述DNA序列,结合方差分析和层次聚类来过滤冗余特征,其预测准确率高达91.6%,灵敏度为91.9%,能够有效地识别启动子,特异性为91.2%,能够准确地识别非启动子。此外,我们的模型能够在400个独立样本中正确识别181个启动子和174个非启动子,这证明所开发的预测模型具有出色的稳健性。