一种用于改进基于结构的转录因子结合位点预测的高效算法。

An efficient algorithm for improving structure-based prediction of transcription factor binding sites.

作者信息

Farrel Alvin, Guo Jun-Tao

机构信息

Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.

出版信息

BMC Bioinformatics. 2017 Jul 17;18(1):342. doi: 10.1186/s12859-017-1755-0.

DOI:10.1186/s12859-017-1755-0

PMID:28715997

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5514533/

Abstract

BACKGROUND

Gene expression is regulated by transcription factors binding to specific target DNA sites. Understanding how and where transcription factors bind at genome scale represents an essential step toward our understanding of gene regulation networks. Previously we developed a structure-based method for prediction of transcription factor binding sites using an integrative energy function that combines a knowledge-based multibody potential and two atomic energy terms. While the method performs well, it is not computationally efficient due to the exponential increase in the number of binding sequences to be evaluated for longer binding sites. In this paper, we present an efficient pentamer algorithm by splitting DNA binding sequences into overlapping fragments along with a simplified integrative energy function for transcription factor binding site prediction.

RESULTS

A DNA binding sequence is split into overlapping pentamers (5 base pairs) for calculating transcription factor-pentamer interaction energy. To combine the results from overlapping pentamer scores, we developed two methods, Kmer-Sum and PWM (Position Weight Matrix) stacking, for full-length binding motif prediction. Our results show that both Kmer-Sum and PWM stacking in the new pentamer approach along with a simplified integrative energy function improved transcription factor binding site prediction accuracy and dramatically reduced computation time, especially for longer binding sites.

CONCLUSION

Our new fragment-based pentamer algorithm and simplified energy function improve both efficiency and accuracy. To our knowledge, this is the first fragment-based method for structure-based transcription factor binding sites prediction.

摘要

背景

基因表达受转录因子与特定靶DNA位点结合的调控。了解转录因子在基因组规模上如何以及在何处结合是我们理解基因调控网络的关键一步。此前我们开发了一种基于结构的方法，利用结合基于知识的多体势和两个原子能项的综合能量函数来预测转录因子结合位点。虽然该方法表现良好，但由于对于更长的结合位点，待评估的结合序列数量呈指数增长，其计算效率不高。在本文中，我们提出了一种高效的五聚体算法，通过将DNA结合序列拆分为重叠片段，并结合简化的综合能量函数来预测转录因子结合位点。

结果

将DNA结合序列拆分为重叠的五聚体（5个碱基对）以计算转录因子 - 五聚体相互作用能。为了合并重叠五聚体得分的结果，我们开发了两种方法，即Kmer - Sum和PWM（位置权重矩阵）堆叠，用于全长结合基序预测。我们的结果表明，新的五聚体方法中的Kmer - Sum和PWM堆叠以及简化的综合能量函数提高了转录因子结合位点预测的准确性，并显著减少了计算时间，尤其是对于更长的结合位点。

结论

我们新的基于片段的五聚体算法和简化的能量函数提高了效率和准确性。据我们所知，这是第一种基于片段的用于基于结构的转录因子结合位点预测的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/446a/5514533/888ff5c9a925/12859_2017_1755_Fig1_HTML.jpg

相似文献

An efficient algorithm for improving structure-based prediction of transcription factor binding sites.

BMC Bioinformatics. 2017 Jul 17;18(1):342. doi: 10.1186/s12859-017-1755-0.

Reliable scaling of position weight matrices for binding strength comparisons between transcription factors.

BMC Bioinformatics. 2015 Aug 20;16:265. doi: 10.1186/s12859-015-0666-1.

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites.

Nucleic Acids Res. 2012 Aug;40(14):e106. doi: 10.1093/nar/gks283. Epub 2012 Apr 5.

Structure-based prediction of transcription factor binding sites using a protein-DNA docking approach.

Proteins. 2008 Sep;72(4):1114-24. doi: 10.1002/prot.22002.

Structure-based prediction of transcription factor binding specificity using an integrative energy function.

Bioinformatics. 2016 Jun 15;32(12):i306-i313. doi: 10.1093/bioinformatics/btw264.

Metamotifs--a generative model for building families of nucleotide position weight matrices.

BMC Bioinformatics. 2010 Jun 25;11:348. doi: 10.1186/1471-2105-11-348.

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.

BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.

A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.

Bioinformatics. 2015 Nov 1;31(21):3445-50. doi: 10.1093/bioinformatics/btv391. Epub 2015 Jun 30.

A subspace method for the detection of transcription factor binding sites.

Bioinformatics. 2012 May 15;28(10):1328-35. doi: 10.1093/bioinformatics/bts147. Epub 2012 Mar 29.

Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach.

BMC Bioinformatics. 2015 Jan 28;16:22. doi: 10.1186/s12859-015-0450-2.

引用本文的文献

Predicting Transcription Factor Binding Sites with Deep Learning.

Int J Mol Sci. 2024 May 3;25(9):4990. doi: 10.3390/ijms25094990.

A Counterintuitive Neutrophil-Mediated Pattern in COVID-19 Patients Revealed through Transcriptomics Analysis.

Viruses. 2022 Dec 30;15(1):104. doi: 10.3390/v15010104.

Genome-wide identification and response stress expression analysis of the family in rubber tree ( Muell. Arg.).

PeerJ. 2022 May 13;10:e13189. doi: 10.7717/peerj.13189. eCollection 2022.

Insights into protein-DNA interactions from hydrogen bond energy-based comparative protein-ligand analyses.

Proteins. 2022 Jun;90(6):1303-1314. doi: 10.1002/prot.26313. Epub 2022 Feb 14.

Dissecting Transcription Factor-Target Interaction in Bovine Coronavirus Infection.

Microorganisms. 2020 Aug 30;8(9):1323. doi: 10.3390/microorganisms8091323.

Beyond Trees: Regulons and Regulatory Motif Characterization.

Genes (Basel). 2020 Aug 25;11(9):995. doi: 10.3390/genes11090995.

An SVM-based method for assessment of transcription factor-DNA complex models.

BMC Bioinformatics. 2018 Dec 21;19(Suppl 20):506. doi: 10.1186/s12859-018-2538-y.

本文引用的文献

Transcription factor family-specific DNA shape readout revealed by quantitative specificity models.

Mol Syst Biol. 2017 Feb 6;13(2):910. doi: 10.15252/msb.20167238.

Structure-based prediction of transcription factor binding specificity using an integrative energy function.

Bioinformatics. 2016 Jun 15;32(12):i306-i313. doi: 10.1093/bioinformatics/btw264.

Survey of variation in human transcription factors reveals prevalent DNA binding changes.

Science. 2016 Mar 25;351(6280):1450-1454. doi: 10.1126/science.aad2257. Epub 2016 Mar 24.

JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles.

Nucleic Acids Res. 2016 Jan 4;44(D1):D110-5. doi: 10.1093/nar/gkv1176. Epub 2015 Nov 3.

Quantitative modeling of transcription factor binding specificities using DNA shape.

Proc Natl Acad Sci U S A. 2015 Apr 14;112(15):4654-9. doi: 10.1073/pnas.1422023112. Epub 2015 Mar 9.

A flexible extension of the Drosophila ultrabithorax homeodomain defines a novel Hox/PBC interaction mode.

Structure. 2015 Feb 3;23(2):270-9. doi: 10.1016/j.str.2014.12.011.

Low affinity binding site clusters confer hox specificity and regulatory robustness.

Cell. 2015 Jan 15;160(1-2):191-203. doi: 10.1016/j.cell.2014.11.041. Epub 2014 Dec 31.

GBshape: a genome browser database for DNA shape annotations.

Nucleic Acids Res. 2015 Jan;43(Database issue):D103-9. doi: 10.1093/nar/gku977. Epub 2014 Oct 17.

Protein-DNA binding in the absence of specific base-pair recognition.

Proc Natl Acad Sci U S A. 2014 Dec 2;111(48):17140-5. doi: 10.1073/pnas.1410569111. Epub 2014 Oct 13.

Absence of a simple code: how transcription factors read the genome.

Trends Biochem Sci. 2014 Sep;39(9):381-99. doi: 10.1016/j.tibs.2014.07.002. Epub 2014 Aug 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于改进基于结构的转录因子结合位点预测的高效算法。

An efficient algorithm for improving structure-based prediction of transcription factor binding sites.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献