IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1316-1321. doi: 10.1109/TCBB.2017.2666141. Epub 2017 Feb 8.
Promoters are DNA regulatory elements located directly upstream or at the 5' end of the transcription initiation site (TSS), which are in charge of gene transcription initiation. With the completion of a large number of microorganism genomics, it is urgent to predict promoters accurately in bacteria by using the computational method. In this work, a sequence-based predictor named "iPro70-PseZNC" was designed for identifying sigma70 promoters in prokaryote. In the predictor, the samples of DNA sequences are formulated by a novel pseudo nucleotide composition, called PseZNC, into which the multi-window Z-curve composition and six local DNA structural properties are incorporated. In the 5-fold cross-validation, the area under the curve of receiver operating characteristic of 0.909 was obtained on our benchmark dataset, indicating that the proposed predictor is promising and will provide an important guide in this area. Further studies showed that the performance of PseZNC is better than it of multi-window Z-curve composition. For the sake of convenience for researchers, a user-friendly online service was established and can be freely accessible at http://lin.uestc.edu.cn/server/iPro70-PseZNC. The PseZNC approach can be also extended to other DNA-related problems.
启动子是位于转录起始位点(TSS)上游或 5'端的 DNA 调控元件,负责基因转录的起始。随着大量微生物基因组学的完成,迫切需要通过计算方法准确预测细菌中的启动子。在这项工作中,设计了一种基于序列的预测器,名为“iPro70-PseZNC”,用于识别原核生物中的 sigma70 启动子。在预测器中,通过一种新的伪核苷酸组成,称为 PseZNC,将 DNA 序列样本公式化为一种新的伪核苷酸组成,其中包含多窗口 Z 曲线组成和六个局部 DNA 结构特性。在 5 倍交叉验证中,我们的基准数据集的曲线下面积(ROC)的接收者操作特征为 0.909,表明该预测器具有广阔的前景,将为该领域提供重要的指导。进一步的研究表明,PseZNC 的性能优于多窗口 Z 曲线组成。为了方便研究人员,我们建立了一个用户友好的在线服务,可以在 http://lin.uestc.edu.cn/server/iPro70-PseZNC 上免费访问。PseZNC 方法也可以扩展到其他与 DNA 相关的问题。