• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

G4PromFinder:一种基于富含 AT 的元件和 G-四链体基序预测 GC 丰富型细菌基因组转录启动子的算法。

G4PromFinder: an algorithm for predicting transcription promoters in GC-rich bacterial genomes based on AT-rich elements and G-quadruplex motifs.

机构信息

Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy.

Institute of Biomedical Technologies National Research Council, Milan, Segrate, Italy.

出版信息

BMC Bioinformatics. 2018 Feb 6;19(1):36. doi: 10.1186/s12859-018-2049-x.

DOI:10.1186/s12859-018-2049-x
PMID:29409441
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5801747/
Abstract

BACKGROUND

Over the last few decades, computational genomics has tremendously contributed to decipher biology from genome sequences and related data. Considerable effort has been devoted to the prediction of transcription promoter and terminator sites that represent the essential "punctuation marks" for DNA transcription. Computational prediction of promoters in prokaryotes is a problem whose solution is far from being determined in computational genomics. The majority of published bacterial promoter prediction tools are based on a consensus-sequences search and they were designed specifically for vegetative σ promoters and, therefore, not suitable for promoter prediction in bacteria encoding a lot of σ factors, like actinomycetes.

RESULTS

In this study we investigated the possibility to identify putative promoters in prokaryotes based on evolutionarily conserved motifs, and focused our attention on GC-rich bacteria in which promoter prediction with conventional, consensus-based algorithms is often not-exhaustive. Here, we introduce G4PromFinder, a novel algorithm that predicts putative promoters based on AT-rich elements and G-quadruplex DNA motifs. We tested its performances by using available genomic and transcriptomic data of the model microorganisms Streptomyces coelicolor A3(2) and Pseudomonas aeruginosa PA14. We compared our results with those obtained by three currently available promoter predicting algorithms: the σconsensus-based PePPER, the σ factors consensus-based bTSSfinder, and PromPredict which is based on double-helix DNA stability. Our results demonstrated that G4PromFinder is more suitable than the three reference tools for both the genomes. In fact our algorithm achieved the higher accuracy (F-scores 0.61 and 0.53 in the two genomes) as compared to the next best tool that is PromPredict (F-scores 0.46 and 0.48). Consensus-based algorithms produced lower performances with the analyzed GC-rich genomes.

CONCLUSIONS

Our analysis shows that G4PromFinder is a powerful tool for promoter search in GC-rich bacteria, especially for bacteria coding for a lot of σ factors, such as the model microorganism S. coelicolor A3(2). Moreover consensus-based tools and, in general, tools that are based on specific features of bacterial σ factors seem to be less performing for promoter prediction in these types of bacterial genomes.

摘要

背景

在过去的几十年中,计算基因组学极大地促进了从基因组序列和相关数据中破译生物学。人们已经投入了相当大的努力来预测转录启动子和终止子位点,这些位点是 DNA 转录的基本“标点符号”。原核生物启动子的计算预测是一个问题,在计算基因组学中,这个问题的解决方案还远未确定。大多数已发表的细菌启动子预测工具都是基于保守序列搜索的,它们是专门为营养σ启动子设计的,因此不适合预测含有大量σ因子的细菌的启动子,如放线菌。

结果

在这项研究中,我们研究了基于进化保守基序识别原核生物中潜在启动子的可能性,并将注意力集中在 GC 丰富的细菌上,因为传统的基于共识的算法在这些细菌中进行启动子预测往往不彻底。在这里,我们引入了 G4PromFinder,这是一种基于富含 AT 的元件和 G-四联体 DNA 基序预测潜在启动子的新算法。我们使用模型微生物链霉菌 A3(2)和铜绿假单胞菌 PA14 的可用基因组和转录组数据来测试其性能。我们将结果与三种现有的启动子预测算法的结果进行了比较:基于σ 因子共识的 PePPER、基于σ 因子共识的 bTSSfinder 和基于双螺旋 DNA 稳定性的 PromPredict。结果表明,G4PromFinder 比三种参考工具更适合这两种基因组。事实上,我们的算法在两个基因组中的准确性(F 分数为 0.61 和 0.53)都高于下一个最佳工具 PromPredict(F 分数为 0.46 和 0.48)。基于共识的算法在分析的 GC 丰富基因组中的性能较低。

结论

我们的分析表明,G4PromFinder 是一种在 GC 丰富的细菌中寻找启动子的强大工具,特别是对于编码大量σ 因子的细菌,如模型微生物链霉菌 A3(2)。此外,基于共识的工具,以及一般来说,基于细菌σ 因子特定特征的工具,在这些类型的细菌基因组中进行启动子预测时似乎表现不佳。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/5801747/17bbc497e522/12859_2018_2049_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/5801747/9ae6d4c0f47e/12859_2018_2049_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/5801747/0dfbbe62f46b/12859_2018_2049_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/5801747/17bbc497e522/12859_2018_2049_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/5801747/9ae6d4c0f47e/12859_2018_2049_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/5801747/0dfbbe62f46b/12859_2018_2049_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/65bf/5801747/17bbc497e522/12859_2018_2049_Fig3_HTML.jpg

相似文献

1
G4PromFinder: an algorithm for predicting transcription promoters in GC-rich bacterial genomes based on AT-rich elements and G-quadruplex motifs.G4PromFinder:一种基于富含 AT 的元件和 G-四链体基序预测 GC 丰富型细菌基因组转录启动子的算法。
BMC Bioinformatics. 2018 Feb 6;19(1):36. doi: 10.1186/s12859-018-2049-x.
2
bTSSfinder: a novel tool for the prediction of promoters in cyanobacteria and Escherichia coli.bTSSfinder:一种用于预测蓝藻和大肠杆菌中启动子的新型工具。
Bioinformatics. 2017 Feb 1;33(3):334-340. doi: 10.1093/bioinformatics/btw629.
3
RhoTermPredict: an algorithm for predicting Rho-dependent transcription terminators based on Escherichia coli, Bacillus subtilis and Salmonella enterica databases.RhoTermPredict:一种基于大肠杆菌、枯草芽孢杆菌和沙门氏菌数据库预测 Rho 依赖转录终止子的算法。
BMC Bioinformatics. 2019 Mar 7;20(1):117. doi: 10.1186/s12859-019-2704-x.
4
Identification of putative promoters in 48 eukaryotic genomes on the basis of DNA free energy.基于 DNA 自由能鉴定 48 种真核生物基因组中的假定启动子。
Sci Rep. 2018 Mar 14;8(1):4520. doi: 10.1038/s41598-018-22129-8.
5
Genome-wide prediction and validation of sigma70 promoters in Lactobacillus plantarum WCFS1.在植物乳杆菌 WCFS1 中全基因组预测和验证 sigma70 启动子。
PLoS One. 2012;7(9):e45097. doi: 10.1371/journal.pone.0045097. Epub 2012 Sep 20.
6
Using hidden Markov models to investigate G-quadruplex motifs in genomic sequences.使用隐马尔可夫模型研究基因组序列中的G-四链体基序。
BMC Genomics. 2014;15 Suppl 9(Suppl 9):S15. doi: 10.1186/1471-2164-15-S9-S15. Epub 2014 Dec 8.
7
Genome-wide determination of transcription start sites reveals new insights into promoter structures in the actinomycete Corynebacterium glutamicum.全基因组转录起始位点的测定揭示了放线菌谷氨酸棒杆菌启动子结构的新见解。
J Biotechnol. 2017 Sep 10;257:99-109. doi: 10.1016/j.jbiotec.2017.04.008. Epub 2017 Apr 13.
8
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.一种用于原核生物基因组中顺式调控基序识别的综合且适用的系统发育足迹分析框架。
BMC Genomics. 2016 Aug 9;17:578. doi: 10.1186/s12864-016-2982-x.
9
Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes.用于预测细菌基因组中强启动子候选序列的三联体模式算法
BMC Bioinformatics. 2008 May 9;9:233. doi: 10.1186/1471-2105-9-233.
10
Inferring regulatory elements from a whole genome. An analysis of Helicobacter pylori sigma(80) family of promoter signals.从全基因组推断调控元件。幽门螺杆菌σ80启动子信号家族分析。
J Mol Biol. 2000 Mar 24;297(2):335-53. doi: 10.1006/jmbi.2000.3576.

引用本文的文献

1
TSS-Captur: a user-friendly pipeline for characterizing unclassified RNA transcripts.TSS-Captur:一个用于表征未分类RNA转录本的用户友好型流程。
NAR Genom Bioinform. 2024 Dec 18;6(4):lqae168. doi: 10.1093/nargab/lqae168. eCollection 2024 Dec.
2
Recognition of cyanobacteria promoters via Siamese network-based contrastive learning under novel non-promoter generation.基于暹罗网络的对比学习在新型非启动子生成下对蓝藻启动子的识别
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae193.
3
Classification of Promoter Sequences from Human Genome.

本文引用的文献

1
bTSSfinder: a novel tool for the prediction of promoters in cyanobacteria and Escherichia coli.bTSSfinder:一种用于预测蓝藻和大肠杆菌中启动子的新型工具。
Bioinformatics. 2017 Feb 1;33(3):334-340. doi: 10.1093/bioinformatics/btw629.
2
G-quadruplex prediction in E. coli genome reveals a conserved putative G-quadruplex-Hairpin-Duplex switch.大肠杆菌基因组中的G-四链体预测揭示了一种保守的假定G-四链体-发夹-双链体开关。
Nucleic Acids Res. 2016 Nov 2;44(19):9083-9095. doi: 10.1093/nar/gkw769. Epub 2016 Sep 4.
3
The dynamic transcriptional and translational landscape of the model antibiotic producer Streptomyces coelicolor A3(2).
人类基因组启动子序列分类。
Int J Mol Sci. 2023 Aug 8;24(16):12561. doi: 10.3390/ijms241612561.
4
G-QINDER Tool: Bioinformatically Predicted Formation of Different Four-Stranded DNA Motifs from (GT) and (GA) Repeats.G-QINDER 工具:从(GT)和(GA)重复序列预测形成不同四链 DNA 基序。
Int J Mol Sci. 2023 Apr 20;24(8):7565. doi: 10.3390/ijms24087565.
5
TSSNote-CyaPromBERT: Development of an integrated platform for highly accurate promoter prediction and visualization of sp. and sp. through a state-of-the-art natural language processing model BERT.TSSNote-CyaPromBERT:通过先进的自然语言处理模型BERT开发用于高度准确的启动子预测以及[物种1]和[物种2]可视化的集成平台。
Front Genet. 2022 Nov 29;13:1067562. doi: 10.3389/fgene.2022.1067562. eCollection 2022.
6
Database of Potential Promoter Sequences in the Genome.基因组中潜在启动子序列数据库。
Biology (Basel). 2022 Jul 26;11(8):1117. doi: 10.3390/biology11081117.
7
PromoterLCNN: A Light CNN-Based Promoter Prediction and Classification Model.启动子 LCNN:一种基于轻量级卷积神经网络的启动子预测和分类模型。
Genes (Basel). 2022 Jun 23;13(7):1126. doi: 10.3390/genes13071126.
8
Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction.原核生物和真核生物启动子预测的计算工具的批判性评估。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab551.
9
Promotech: a general tool for bacterial promoter recognition.Promotech:一种用于细菌启动子识别的通用工具。
Genome Biol. 2021 Nov 17;22(1):318. doi: 10.1186/s13059-021-02514-9.
10
Beyond Self-Resistance: ABCF ATPase LmrC Is a Signal-Transducing Component of an Antibiotic-Driven Signaling Cascade Accelerating the Onset of Lincomycin Biosynthesis.超越自我抵抗:ABCF ATP 酶 LmrC 是抗生素驱动信号级联的信号转导组件,可加速林可霉素生物合成的起始。
mBio. 2021 Oct 26;12(5):e0173121. doi: 10.1128/mBio.01731-21. Epub 2021 Sep 7.
模式抗生素产生菌变铅青链霉菌 A3(2)的动态转录和翻译景观。
Nat Commun. 2016 Jun 2;7:11605. doi: 10.1038/ncomms11605.
4
A model for genesis of transcription systems.转录系统起源的一种模型。
Transcription. 2016;7(1):1-13. doi: 10.1080/21541264.2015.1128518.
5
G-quadruplexes and their regulatory roles in biology.G-四链体及其在生物学中的调控作用。
Nucleic Acids Res. 2015 Oct 15;43(18):8627-37. doi: 10.1093/nar/gkv862. Epub 2015 Sep 8.
6
Seven essential questions on G-quadruplexes.关于G-四链体的七个基本问题。
Biomol Concepts. 2010 Aug 1;1(2):197-213. doi: 10.1515/bmc.2010.011.
7
The Old and New Testaments of gene regulation. Evolution of multi-subunit RNA polymerases and co-evolution of eukaryote complexity with the RNAP II CTD.基因调控的新旧篇章。多亚基RNA聚合酶的进化以及真核生物复杂性与RNA聚合酶II C末端结构域的协同进化。
Transcription. 2014;5(3):e28674. doi: 10.4161/trns.28674.
8
The σ enigma: bacterial σ factors, archaeal TFB and eukaryotic TFIIB are homologs.σ之谜:细菌的σ因子、古细菌的TFB和真核生物的TFIIB是同源物。
Transcription. 2014;5(4):e967599. doi: 10.4161/21541264.2014.967599.
9
A matter of location: influence of G-quadruplexes on Escherichia coli gene expression.位置问题:G-四链体对大肠杆菌基因表达的影响
Chem Biol. 2014 Nov 20;21(11):1511-21. doi: 10.1016/j.chembiol.2014.09.014.
10
Conserved architecture of the core RNA polymerase II initiation complex.核心 RNA 聚合酶 II 起始复合物的保守结构。
Nat Commun. 2014 Jul 10;5:4310. doi: 10.1038/ncomms5310.