利用DNA结构特征进行通用真核生物核心启动子预测。

Generic eukaryotic core promoter prediction using structural features of DNA.

作者信息

Abeel Thomas, Saeys Yvan, Bonnet Eric, Rouzé Pierre, Van de Peer Yves

机构信息

Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium,

出版信息

Genome Res. 2008 Feb;18(2):310-23. doi: 10.1101/gr.6991408. Epub 2007 Dec 20.

DOI:10.1101/gr.6991408

PMID:18096745

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2203629/

Abstract

Despite many recent efforts, in silico identification of promoter regions is still in its infancy. However, the accurate identification and delineation of promoter regions is important for several reasons, such as improving genome annotation and devising experiments to study and understand transcriptional regulation. Current methods to identify the core region of promoters require large amounts of high-quality training data and often behave like black box models that output predictions that are difficult to interpret. Here, we present a novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA. Our technique requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs. Moreover, it is fast, simple in design, and has no size constraints, and the results are easily interpretable. We compared our approach with 14 current state-of-the-art implementations using human gene and transcription start site data and analyzed the ENCODE region in more detail. We also validated our method on 12 additional eukaryotic genomes, including vertebrates, invertebrates, plants, fungi, and protists.

摘要

尽管最近做出了许多努力，但在计算机上识别启动子区域仍处于起步阶段。然而，准确识别和界定启动子区域很重要，原因有几个，比如改进基因组注释以及设计实验来研究和理解转录调控。目前识别启动子核心区域的方法需要大量高质量的训练数据，并且通常表现得像黑箱模型，输出难以解释的预测结果。在此，我们提出一种通过利用DNA的大规模结构特性来预测全基因组序列中启动子的新方法。我们的技术无需训练，适用于许多真核生物基因组，并且与现有的最佳启动子预测程序相比表现极佳。此外，它速度快、设计简单且没有大小限制，结果易于解释。我们使用人类基因和转录起始位点数据将我们的方法与14种当前最先进的实现方法进行了比较，并更详细地分析了ENCODE区域。我们还在另外12个真核生物基因组上验证了我们的方法，这些基因组包括脊椎动物、无脊椎动物、植物、真菌和原生生物。

相似文献

Generic eukaryotic core promoter prediction using structural features of DNA.利用DNA结构特征进行通用真核生物核心启动子预测。

Genome Res. 2008 Feb;18(2):310-23. doi: 10.1101/gr.6991408. Epub 2007 Dec 20.

TSSFinder-fast and accurate ab initio prediction of the core promoter in eukaryotic genomes.TSSFinder——真核基因组中核心启动子的快速、准确从头预测。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab198.

Identification of putative promoters in 48 eukaryotic genomes on the basis of DNA free energy.基于 DNA 自由能鉴定 48 种真核生物基因组中的假定启动子。

Sci Rep. 2018 Mar 14;8(1):4520. doi: 10.1038/s41598-018-22129-8.

Boosting with stumps for predicting transcription start sites.使用决策树桩进行提升以预测转录起始位点。

Genome Biol. 2007;8(2):R17. doi: 10.1186/gb-2007-8-2-r17.

Computational approaches to identify promoters and cis-regulatory elements in plant genomes.用于识别植物基因组中启动子和顺式调控元件的计算方法。

Plant Physiol. 2003 Jul;132(3):1162-76. doi: 10.1104/pp.102.017715.

Promoter analysis: gene regulatory motif identification with A-GLAM.启动子分析：使用A-GLAM识别基因调控基序

Methods Mol Biol. 2009;537:263-76. doi: 10.1007/978-1-59745-251-9_13.

Automatic annotation of eukaryotic genes, pseudogenes and promoters.真核基因、假基因和启动子的自动注释

Genome Biol. 2006;7 Suppl 1(Suppl 1):S10.1-12. doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7.

ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles.ProSOM：基于DNA物理图谱无监督聚类的核心启动子预测

Bioinformatics. 2008 Jul 1;24(13):i24-31. doi: 10.1093/bioinformatics/btn172.

Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment.在EGASP实验中对ENCODE区域的启动子预测进行性能评估。

Genome Biol. 2006;7 Suppl 1(Suppl 1):S3.1-13. doi: 10.1186/gb-2006-7-s1-s3. Epub 2006 Aug 7.

First pass annotation of promoters on human chromosome 22.人类22号染色体上启动子的首过注释

Genome Res. 2001 Mar;11(3):333-40. doi: 10.1101/gr.154601.

引用本文的文献

Regulatory Elements for Gene Therapy of Epilepsy.癫痫基因治疗的调控元件

Cells. 2025 Feb 6;14(3):236. doi: 10.3390/cells14030236.

Utilizing biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning.利用生物实验数据和分子动力学，通过机器学习对突变热点进行分类。

Bioinform Adv. 2024 Aug 26;4(1):vbae125. doi: 10.1093/bioadv/vbae125. eCollection 2024.

Physical Peculiarity of Two Sites in Human Promoters: Universality and Diverse Usage in Gene Function.人体启动子中两个位点的物理特性：普遍性及其在基因功能中的多样化应用。

Int J Mol Sci. 2024 Jan 25;25(3):1487. doi: 10.3390/ijms25031487.

Evolutionary Invariant of the Structure of DNA Double Helix in RNAP II Core Promoters.RNA 聚合酶 II 核心启动子中 DNA 双螺旋结构的进化不变性。

Int J Mol Sci. 2022 Sep 17;23(18):10873. doi: 10.3390/ijms231810873.

Identification of enhancers responsible for the coordinated expression of myosin heavy chain isoforms in skeletal muscle.鉴定负责协调骨骼肌肌球蛋白重链同工型表达的增强子。

BMC Genomics. 2022 Jul 17;23(1):519. doi: 10.1186/s12864-022-08737-9.

A successful hybrid deep learning model aiming at promoter identification.一个成功的混合深度学习模型，旨在进行启动子识别。

BMC Bioinformatics. 2022 May 31;23(Suppl 1):206. doi: 10.1186/s12859-022-04735-6.

PredPromoter-MF(2L): A Novel Approach of Promoter Prediction Based on Multi-source Feature Fusion and Deep Forest.启动子预测-MF(2L)：一种基于多源特征融合和深度森林的新型启动子预测方法。

Interdiscip Sci. 2022 Sep;14(3):697-711. doi: 10.1007/s12539-022-00520-4. Epub 2022 Apr 30.

Sequence-based evaluation of promoter context for prediction of transcription start sites in Arabidopsis and rice.基于序列的启动子上下文评估，用于预测拟南芥和水稻的转录起始位点。

Sci Rep. 2022 Apr 28;12(1):6976. doi: 10.1038/s41598-022-11169-w.

Leveraging omic features with F3UTER enables identification of unannotated 3'UTRs for synaptic genes.利用 omic 特征与 F3UTER 可鉴定突触基因的未注释 3'UTR。

Nat Commun. 2022 Apr 27;13(1):2270. doi: 10.1038/s41467-022-30017-z.

Delineation of the DNA Structural Features of Eukaryotic Core Promoter Classes.真核生物核心启动子类别的DNA结构特征描绘

ACS Omega. 2022 Feb 9;7(7):5657-5669. doi: 10.1021/acsomega.1c04603. eCollection 2022 Feb 22.

本文引用的文献

An optimized potential function for the calculation of nucleic acid interaction energies I. base stacking.用于计算核酸相互作用能的优化势能函数 I. 碱基堆积。

Biopolymers. 1978 Oct;17(10):2341-60. doi: 10.1002/bip.1978.360171005.

Wide-scale analysis of human functional transcription factor binding reveals a strong bias towards the transcription start site.大规模分析人类功能转录因子结合揭示了强烈偏向转录起始位点的现象。

PLoS One. 2007 Aug 29;2(8):e807. doi: 10.1371/journal.pone.0000807.

Construction of a genome-scale structural map at single-nucleotide resolution.单核苷酸分辨率下基因组规模结构图谱的构建。

Genome Res. 2007 Jun;17(6):947-53. doi: 10.1101/gr.6073107.

The DART classification of unannotated transcription within the ENCODE regions: associating transcription with known and novel loci.ENCODE区域内未注释转录本的DART分类：将转录与已知和新基因座相关联。

Genome Res. 2007 Jun;17(6):732-45. doi: 10.1101/gr.5696007.

The human genomic melting map.人类基因组熔解图谱。

PLoS Comput Biol. 2007 May;3(5):e93. doi: 10.1371/journal.pcbi.0030093. Epub 2007 Apr 11.

Genome-wide transcription and the implications for genomic organization.全基因组转录及其对基因组组织的影响。

Nat Rev Genet. 2007 Jun;8(6):413-23. doi: 10.1038/nrg2083. Epub 2007 May 8.

Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs.功能还是转录噪声？长链非编码RNA中存在选择的证据。

Genome Res. 2007 May;17(5):556-65. doi: 10.1101/gr.6036807. Epub 2007 Mar 26.

Characterization and identification of microRNA core promoters in four model species.四种模式物种中微小RNA核心启动子的表征与鉴定

PLoS Comput Biol. 2007 Mar 9;3(3):e37. doi: 10.1371/journal.pcbi.0030037. Epub 2007 Jan 9.

Large-scale identification of novel transcripts in the human genome.人类基因组中新型转录本的大规模鉴定。

Genome Res. 2007 Mar;17(3):287-92. doi: 10.1101/gr.5486607. Epub 2007 Jan 31.

Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters.进化保守的哺乳动物可变启动子的表征与预测性发现

Genome Res. 2007 Feb;17(2):145-55. doi: 10.1101/gr.5872707. Epub 2007 Jan 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验