• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

结合优化、遗传算法和判别分析以捕捉远距离相互作用来进行有效的转录因子结合位点预测。

Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions.

作者信息

Levitsky Victor G, Ignatieva Elena V, Ananko Elena A, Turnaev Igor I, Merkulova Tatyana I, Kolchanov Nikolay A, Hodgman T C

机构信息

Institute of Cytology and Genetics SB RAS, Novosibirsk, 630090, Russia.

出版信息

BMC Bioinformatics. 2007 Dec 19;8:481. doi: 10.1186/1471-2105-8-481.

DOI:10.1186/1471-2105-8-481
PMID:18093302
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2265442/
Abstract

BACKGROUND

Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered.

RESULTS

To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-kappaB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies. To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA. Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies.

CONCLUSION

Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.

摘要

背景

可靠的转录因子结合位点(TFBS)预测方法对于大量基因组序列数据的计算机注释至关重要。然而,当前预测TFBS的方法受到仅考虑核心结合位点处的序列保守性时出现的高假阳性率的阻碍。

结果

为改善这种情况,我们使用穷举方法来找到其最佳长度和位置,从而量化了几种位置权重矩阵(PWM)算法的性能。我们将这些方法应用于参与细胞生长和增殖调节以及炎症、免疫和抗病毒反应(NF-κB、ISGF3、IRF1、STAT1)、肥胖和脂质代谢(PPAR、SREBP、HNF4)、类固醇生成(SF-1)和细胞周期(E2F)基因表达调节的生物医学重要TFBS。我们还使用一种名为SiteGA的方法获得了额外的特异性,该方法使用具有局部定位二核苷酸(LPD)频率判别函数的遗传算法(GA),考虑TFBS核心和侧翼区域内的结构相互作用。为确保我们方法的更高可信度,我们应用重采样刀切法和自展检验进行比较,结果表明,优化的PWM和SiteGA显示出相似的识别性能。然后我们将SiteGA和优化的PWM(单独和一起)应用于真核生物启动子数据库(EPD)中的序列。现在可以使用网络工具SiteGA将所得的SiteGA识别模型用于搜索序列中的结合位点。对SiteGA模型揭示的近距离和远距离LPD之间的依赖性分析表明,最显著的相关性存在于近距离LPD之间,并且通常位于核心(足迹)区域。大量不太显著的相关性主要存在于远距离LPD之间,其跨越核心和侧翼区域。当一起应用SiteGA和优化的PWM模型时,这至少在更高严格度下大幅减少了假阳性。

结论

基于此分析,SiteGA即使对于优化的PWM也增加了显著的特异性,可考虑用于大规模基因组分析。它增加了可用于TFBS预测的技术范围,并且EPD分析产生了一份似乎受上述转录因子调节的基因列表。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/70a3a68a0580/1471-2105-8-481-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/83c7777a22d9/1471-2105-8-481-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/4ca4e20d5d24/1471-2105-8-481-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/5c9e1ec688ee/1471-2105-8-481-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/184fbc6d75e4/1471-2105-8-481-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/c37d51044f75/1471-2105-8-481-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/f941b561f4ea/1471-2105-8-481-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/6c7bfe6f67a9/1471-2105-8-481-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/2a84bf45fff2/1471-2105-8-481-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/d3e75bbb32d5/1471-2105-8-481-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/70a3a68a0580/1471-2105-8-481-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/83c7777a22d9/1471-2105-8-481-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/4ca4e20d5d24/1471-2105-8-481-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/5c9e1ec688ee/1471-2105-8-481-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/184fbc6d75e4/1471-2105-8-481-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/c37d51044f75/1471-2105-8-481-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/f941b561f4ea/1471-2105-8-481-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/6c7bfe6f67a9/1471-2105-8-481-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/2a84bf45fff2/1471-2105-8-481-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/d3e75bbb32d5/1471-2105-8-481-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a613/2265442/70a3a68a0580/1471-2105-8-481-10.jpg

相似文献

1
Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions.结合优化、遗传算法和判别分析以捕捉远距离相互作用来进行有效的转录因子结合位点预测。
BMC Bioinformatics. 2007 Dec 19;8:481. doi: 10.1186/1471-2105-8-481.
2
[Method SiteGA for the recognition of transcription factor binding sites].
Biofizika. 2006 Jul-Aug;51(4):633-9.
3
[Recognition of the potential SF-1 binding sites by SiteGA method, their experimental verification and search for new SF-1 target genes].
Mol Biol (Mosk). 2006 May-Jun;40(3):512-23.
4
On counting position weight matrix matches in a sequence, with application to discriminative motif finding.关于计算序列中的位置权重矩阵匹配及其在判别性基序发现中的应用。
Bioinformatics. 2006 Jul 15;22(14):e454-63. doi: 10.1093/bioinformatics/btl227.
5
A mixture model-based discriminate analysis for identifying ordered transcription factor binding site pairs in gene promoters directly regulated by estrogen receptor-alpha.基于混合模型的判别分析,用于识别由雌激素受体α直接调控的基因启动子中的有序转录因子结合位点对。
Bioinformatics. 2006 Sep 15;22(18):2210-6. doi: 10.1093/bioinformatics/btl329. Epub 2006 Jun 29.
6
Context specific transcription factor prediction.上下文特异性转录因子预测
Ann Biomed Eng. 2007 Jun;35(6):1053-67. doi: 10.1007/s10439-007-9268-z. Epub 2007 Mar 22.
7
Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data.经实验验证的转录因子结合位点模型在ChIP-Seq数据计算分析中的应用。
BMC Genomics. 2014 Jan 29;15(1):80. doi: 10.1186/1471-2164-15-80.
8
Recognition of interferon-inducible sites, promoters, and enhancers.干扰素诱导位点、启动子及增强子的识别
BMC Bioinformatics. 2007 Feb 19;8:56. doi: 10.1186/1471-2105-8-56.
9
From binding motifs in ChIP-Seq data to improved models of transcription factor binding sites.从染色质免疫沉淀测序(ChIP-Seq)数据中的结合基序到转录因子结合位点的改进模型
J Bioinform Comput Biol. 2013 Feb;11(1):1340004. doi: 10.1142/S0219720013400040. Epub 2013 Jan 16.
10
dPattern: transcription factor binding site (TFBS) discovery in human genome using a discriminative pattern analysis.d模式:利用判别模式分析在人类基因组中发现转录因子结合位点(TFBS)
Bioinformatics. 2007 Oct 1;23(19):2619-21. doi: 10.1093/bioinformatics/btm288. Epub 2007 Jun 5.

引用本文的文献

1
Asymmetry of Motif Conservation Within Their Homotypic Pairs Distinguishes DNA-Binding Domains of Target Transcription Factors in ChIP-Seq Data.染色质免疫沉淀测序(ChIP-Seq)数据中,基序在其同型配对内的保守性不对称可区分靶转录因子的DNA结合结构域。
Int J Mol Sci. 2025 Jan 4;26(1):386. doi: 10.3390/ijms26010386.
2
Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis.提出核苷酸独立和相互依赖影响的基序模型与拟南芥中高亲和力和低亲和力转录因子结合位点有关。
Front Plant Sci. 2022 Jul 28;13:938545. doi: 10.3389/fpls.2022.938545. eCollection 2022.
3

本文引用的文献

1
In silico modelling of hormone response elements.激素反应元件的计算机模拟
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S27. doi: 10.1186/1471-2105-7-S4-S27.
2
[Method SiteGA for the recognition of transcription factor binding sites].
Biofizika. 2006 Jul-Aug;51(4):633-9.
3
Computational identification of transcriptional regulatory elements in DNA sequence.DNA序列中转录调控元件的计算识别
Nucleic Acids Res. 2006 Jul 19;34(12):3585-98. doi: 10.1093/nar/gkl372. Print 2006.
A single ChIP-seq dataset is sufficient for comprehensive analysis of motifs co-occurrence with MCOT package.
单个 ChIP-seq 数据集足以使用 MCOT 包全面分析与 MOTF 共现的情况。
Nucleic Acids Res. 2019 Dec 2;47(21):e139. doi: 10.1093/nar/gkz800.
4
Spatial specificity of auxin responses coordinates wood formation.生长素响应的空间特异性协调木材形成。
Nat Commun. 2018 Feb 28;9(1):875. doi: 10.1038/s41467-018-03256-2.
5
The Interplay of Chromatin Landscape and DNA-Binding Context Suggests Distinct Modes of EIN3 Regulation in .染色质景观与DNA结合环境的相互作用表明了EIN3在……中的不同调控模式。
Front Plant Sci. 2017 Jan 9;7:2044. doi: 10.3389/fpls.2016.02044. eCollection 2016.
6
The expansion of heterochromatin blocks in rye reflects the co-amplification of tandem repeats and adjacent transposable elements.黑麦中异染色质块的扩展反映了串联重复序列和相邻转座元件的共同扩增。
BMC Genomics. 2016 May 4;17:337. doi: 10.1186/s12864-016-2667-5.
7
Computational analysis of auxin responsive elements in the Arabidopsis thaliana L. genome.拟南芥基因组中生长素响应元件的计算分析
BMC Genomics. 2014;15 Suppl 12(Suppl 12):S4. doi: 10.1186/1471-2164-15-S12-S4. Epub 2014 Dec 19.
8
Cervical cancer-associated promoter polymorphism affects akna expression levels.宫颈癌相关启动子多态性影响 akna 表达水平。
Genes Immun. 2015 Jan-Feb;16(1):43-53. doi: 10.1038/gene.2014.60. Epub 2014 Nov 6.
9
Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data.经实验验证的转录因子结合位点模型在ChIP-Seq数据计算分析中的应用。
BMC Genomics. 2014 Jan 29;15(1):80. doi: 10.1186/1471-2164-15-80.
10
Disclosing the crosstalk among DNA methylation, transcription factors, and histone marks in human pluripotent cells through discovery of DNA methylation motifs.通过发现 DNA 甲基化基序,揭示人类多能细胞中 DNA 甲基化、转录因子和组蛋白标记之间的串扰。
Genome Res. 2013 Dec;23(12):2013-29. doi: 10.1101/gr.155960.113. Epub 2013 Oct 22.
4
[Recognition of the potential SF-1 binding sites by SiteGA method, their experimental verification and search for new SF-1 target genes].
Mol Biol (Mosk). 2006 May-Jun;40(3):512-23.
5
Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes.研究调控DNA序列的统计特性及其在预测真核生物基因组调控区域中的应用。
Brief Bioinform. 2006 Mar;7(1):48-54. doi: 10.1093/bib/bbk004.
6
Optimized mixed Markov models for motif identification.用于基序识别的优化混合马尔可夫模型。
BMC Bioinformatics. 2006 Jun 2;7:279. doi: 10.1186/1471-2105-7-279.
7
Regulation of NF-kappaB function.核因子-κB功能的调控
Biochem Soc Symp. 2006(73):165-80. doi: 10.1042/bss0730165.
8
An isochore map of human chromosomes.人类染色体的等臂染色体图。
Genome Res. 2006 Apr;16(4):536-41. doi: 10.1101/gr.4910606.
9
EPD in its twentieth year: towards complete promoter coverage of selected model organisms.EPD二十年:迈向选定模式生物启动子的完全覆盖
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D82-5. doi: 10.1093/nar/gkj146.
10
Limitations and potentials of current motif discovery algorithms.当前基序发现算法的局限性与潜力。
Nucleic Acids Res. 2005 Sep 2;33(15):4899-913. doi: 10.1093/nar/gki791. Print 2005.