Suppr超能文献

DPProm:一种基于深度学习的噬菌体基因组启动子及其类型的双层预测器。

DPProm: A Two-Layer Predictor for Identifying Promoters and Their Types on Phage Genome Using Deep Learning.

出版信息

IEEE J Biomed Health Inform. 2022 Oct;26(10):5258-5266. doi: 10.1109/JBHI.2022.3193224. Epub 2022 Oct 4.

Abstract

With the number of phage genomes increasing, it is urgent to develop new bioinformatics methods for phage genome annotation. Promoter, a DNA region, is important for gene transcriptional regulation. In the era of post-genomics, the availability of data makes it possible to establish computational models for promoter identification with robustness. In this work, we introduce DPProm, a two-layer model composed of DPProm-1L and DPProm-2L, to predict promoters and their types for phages. On the first layer, as a dual-channel deep neural network ensemble method fusing multi-view features (sequence feature and handcrafted feature), the model DPProm-1L is proposed to identify whether a DNA sequence is a promoter or non-promoter. The sequence feature is extracted with convolutional neural network (CNN). And the handcrafted feature is the combination of free energy, GC content, cumulative skew, and Z curve features. On the second layer, DPProm-2L based on CNN is trained to predict the promoters' types (host or phage). For the realization of prediction on the whole genomes, the model DPProm, combines with a novel sequence data processing workflow, which contains sliding window and merging sequences modules. Experimental results show that DPProm outperforms the state-of-the-art methods, and decreases the false positive rate effectively on whole genome prediction. Furthermore, we provide a user-friendly web at http://bioinfo.ahu.edu.cn/DPProm. We expect that DPProm can serve as a useful tool for identification of promoters and their types.

摘要

随着噬菌体基因组数量的增加,迫切需要开发新的生物信息学方法来进行噬菌体基因组注释。启动子是一个 DNA 区域,对于基因转录调控非常重要。在后基因组时代,数据的可用性使得建立用于启动子识别的稳健计算模型成为可能。在这项工作中,我们引入了 DPProm,这是一个由 DPProm-1L 和 DPProm-2L 组成的两层模型,用于预测噬菌体的启动子及其类型。在第一层,作为一种融合多视图特征(序列特征和手工特征)的双通道深度神经网络集成方法,模型 DPProm-1L 被提出用于识别 DNA 序列是否为启动子或非启动子。序列特征由卷积神经网络(CNN)提取。手工特征是自由能、GC 含量、累积倾斜和 Z 曲线特征的组合。在第二层,基于 CNN 的 DPProm-2L 被训练来预测启动子的类型(宿主或噬菌体)。为了实现对整个基因组的预测,模型 DPProm 结合了一种新颖的序列数据处理工作流程,其中包含滑动窗口和合并序列模块。实验结果表明,DPProm 优于最新方法,并有效地降低了全基因组预测的假阳性率。此外,我们提供了一个用户友好的网页 http://bioinfo.ahu.edu.cn/DPProm。我们希望 DPProm 可以成为识别启动子及其类型的有用工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验