Suppr超能文献

基于级联深度胶囊神经网络的真核启动子计算识别。

Computational identification of eukaryotic promoters based on cascaded deep capsule neural networks.

机构信息

School of Science, Dalian Maritime University, China.

Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Australia.

出版信息

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa299.

Abstract

A promoter is a region in the DNA sequence that defines where the transcription of a gene by RNA polymerase initiates, which is typically located proximal to the transcription start site (TSS). How to correctly identify the gene TSS and the core promoter is essential for our understanding of the transcriptional regulation of genes. As a complement to conventional experimental methods, computational techniques with easy-to-use platforms as essential bioinformatics tools can be effectively applied to annotate the functions and physiological roles of promoters. In this work, we propose a deep learning-based method termed Depicter (Deep learning for predicting promoter), for identifying three specific types of promoters, i.e. promoter sequences with the TATA-box (TATA model), promoter sequences without the TATA-box (non-TATA model), and indistinguishable promoters (TATA and non-TATA model). Depicter is developed based on an up-to-date, species-specific dataset which includes Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana promoters. A convolutional neural network coupled with capsule layers is proposed to train and optimize the prediction model of Depicter. Extensive benchmarking and independent tests demonstrate that Depicter achieves an improved predictive performance compared with several state-of-the-art methods. The webserver of Depicter is implemented and freely accessible at https://depicter.erc.monash.edu/.

摘要

启动子是 DNA 序列中的一个区域,它定义了 RNA 聚合酶转录基因的起始位置,通常位于转录起始位点(TSS)附近。如何正确识别基因的 TSS 和核心启动子对于我们理解基因的转录调控至关重要。作为传统实验方法的补充,具有易于使用平台的计算技术作为必要的生物信息学工具,可以有效地应用于注释启动子的功能和生理作用。在这项工作中,我们提出了一种基于深度学习的方法,称为 Depicter(用于预测启动子的深度学习),用于识别三种特定类型的启动子,即具有 TATA 盒的启动子序列(TATA 模型)、没有 TATA 盒的启动子序列(非 TATA 模型)和无法区分的启动子(TATA 和非 TATA 模型)。Depicter 是基于一个最新的、特定于物种的数据集开发的,该数据集包括人类、小鼠、果蝇和拟南芥的启动子。我们提出了一种卷积神经网络与胶囊层相结合的方法来训练和优化 Depicter 的预测模型。广泛的基准测试和独立测试表明,Depicter 与几种最先进的方法相比,具有更好的预测性能。Depicter 的网络服务器已经实现,并可在 https://depicter.erc.monash.edu/ 免费访问。

相似文献

3
Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks.
PLoS One. 2017 Feb 3;12(2):e0171410. doi: 10.1371/journal.pone.0171410. eCollection 2017.
4
DeePromClass: Delineator for Eukaryotic Core Promoters Employing Deep Neural Networks.
IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):802-807. doi: 10.1109/TCBB.2022.3163418. Epub 2023 Feb 3.
5
iPTT(2 L)-CNN: A Two-Layer Predictor for Identifying Promoters and Their Types in Plant Genomes by Convolutional Neural Network.
Comput Math Methods Med. 2021 Jan 5;2021:6636350. doi: 10.1155/2021/6636350. eCollection 2021.
6
iProm-Zea: A two-layer model to identify plant promoters and their types using convolutional neural network.
Genomics. 2022 May;114(3):110384. doi: 10.1016/j.ygeno.2022.110384. Epub 2022 May 6.
8
Genome wide analysis of Arabidopsis core promoters.
BMC Genomics. 2005 Feb 25;6:25. doi: 10.1186/1471-2164-6-25.
9
Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach.
BMC Bioinformatics. 2008 Oct 4;9:414. doi: 10.1186/1471-2105-9-414.
10
GraphPro: An interpretable graph neural network-based model for identifying promoters in multiple species.
Comput Biol Med. 2024 Sep;180:108974. doi: 10.1016/j.compbiomed.2024.108974. Epub 2024 Aug 2.

引用本文的文献

1
iPro-CSAF: identification of promoters based on convolutional spiking neural networks and spiking attention mechanism.
PeerJ Comput Sci. 2025 Mar 26;11:e2761. doi: 10.7717/peerj-cs.2761. eCollection 2025.
3
A review of deep learning methods for ligand based drug virtual screening.
Fundam Res. 2024 Mar 11;4(4):715-737. doi: 10.1016/j.fmre.2024.02.011. eCollection 2024 Jul.
4
Predmoter-cross-species prediction of plant promoter and enhancer regions.
Bioinform Adv. 2024 May 24;4(1):vbae074. doi: 10.1093/bioadv/vbae074. eCollection 2024.
6
Nucleic Transformer: Classifying DNA Sequences with Self-Attention and Convolutions.
ACS Synth Biol. 2023 Nov 17;12(11):3205-3214. doi: 10.1021/acssynbio.3c00154. Epub 2023 Nov 2.
7
CapsNetYY1: identifying YY1-mediated chromatin loops based on a capsule network architecture.
BMC Genomics. 2023 Aug 9;24(1):448. doi: 10.1186/s12864-023-09217-4.
8
9
CapsNh-Kcr: Capsule network-based prediction of lysine crotonylation sites in human non-histone proteins.
Comput Struct Biotechnol J. 2022 Dec 1;21:120-127. doi: 10.1016/j.csbj.2022.11.056. eCollection 2023.

本文引用的文献

1
DeepTorrent: a deep learning-based approach for predicting DNA N4-methylcytosine sites.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa124.
2
PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs.
Bioinformatics. 2020 Aug 1;36(15):4276-4282. doi: 10.1093/bioinformatics/btaa522.
3
Procleave: Predicting Protease-specific Substrate Cleavage Sites by Combining Sequence and Structural Information.
Genomics Proteomics Bioinformatics. 2020 Feb;18(1):52-64. doi: 10.1016/j.gpb.2019.08.002. Epub 2020 May 12.
5
iDNA-MS: An Integrated Computational Tool for Detecting DNA Modification Sites in Multiple Genomes.
iScience. 2020 Apr 24;23(4):100991. doi: 10.1016/j.isci.2020.100991. Epub 2020 Mar 19.
7
Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling.
Anal Biochem. 2020 Mar 15;593:113592. doi: 10.1016/j.ab.2020.113592. Epub 2020 Jan 20.
8
Formator: Predicting Lysine Formylation Sites Based on the Most Distant Undersampling and Safe-Level Synthetic Minority Oversampling.
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1937-1945. doi: 10.1109/TCBB.2019.2957758. Epub 2021 Oct 7.
10
DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites.
Bioinformatics. 2020 Feb 15;36(4):1057-1065. doi: 10.1093/bioinformatics/btz721.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验