School of Science, Dalian Maritime University, China.
Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Australia.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa299.
A promoter is a region in the DNA sequence that defines where the transcription of a gene by RNA polymerase initiates, which is typically located proximal to the transcription start site (TSS). How to correctly identify the gene TSS and the core promoter is essential for our understanding of the transcriptional regulation of genes. As a complement to conventional experimental methods, computational techniques with easy-to-use platforms as essential bioinformatics tools can be effectively applied to annotate the functions and physiological roles of promoters. In this work, we propose a deep learning-based method termed Depicter (Deep learning for predicting promoter), for identifying three specific types of promoters, i.e. promoter sequences with the TATA-box (TATA model), promoter sequences without the TATA-box (non-TATA model), and indistinguishable promoters (TATA and non-TATA model). Depicter is developed based on an up-to-date, species-specific dataset which includes Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana promoters. A convolutional neural network coupled with capsule layers is proposed to train and optimize the prediction model of Depicter. Extensive benchmarking and independent tests demonstrate that Depicter achieves an improved predictive performance compared with several state-of-the-art methods. The webserver of Depicter is implemented and freely accessible at https://depicter.erc.monash.edu/.
启动子是 DNA 序列中的一个区域,它定义了 RNA 聚合酶转录基因的起始位置,通常位于转录起始位点(TSS)附近。如何正确识别基因的 TSS 和核心启动子对于我们理解基因的转录调控至关重要。作为传统实验方法的补充,具有易于使用平台的计算技术作为必要的生物信息学工具,可以有效地应用于注释启动子的功能和生理作用。在这项工作中,我们提出了一种基于深度学习的方法,称为 Depicter(用于预测启动子的深度学习),用于识别三种特定类型的启动子,即具有 TATA 盒的启动子序列(TATA 模型)、没有 TATA 盒的启动子序列(非 TATA 模型)和无法区分的启动子(TATA 和非 TATA 模型)。Depicter 是基于一个最新的、特定于物种的数据集开发的,该数据集包括人类、小鼠、果蝇和拟南芥的启动子。我们提出了一种卷积神经网络与胶囊层相结合的方法来训练和优化 Depicter 的预测模型。广泛的基准测试和独立测试表明,Depicter 与几种最先进的方法相比,具有更好的预测性能。Depicter 的网络服务器已经实现,并可在 https://depicter.erc.monash.edu/ 免费访问。