利用 DNA 的理化特征提高转录因子结合位点的预测。

Improved predictions of transcription factor binding sites using physicochemical features of DNA.

机构信息

Department of Chemistry, University of Chicago, Chicago, IL 60637, USA.

出版信息

Nucleic Acids Res. 2012 Dec;40(22):e175. doi: 10.1093/nar/gks771. Epub 2012 Aug 25.

DOI:10.1093/nar/gks771

PMID:22923524

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3526315/

Abstract

Typical approaches for predicting transcription factor binding sites (TFBSs) involve use of a position-specific weight matrix (PWM) to statistically characterize the sequences of the known sites. Recently, an alternative physicochemical approach, called SiteSleuth, was proposed. In this approach, a linear support vector machine (SVM) classifier is trained to distinguish TFBSs from background sequences based on local chemical and structural features of DNA. SiteSleuth appears to generally perform better than PWM-based methods. Here, we improve the SiteSleuth approach by considering both new physicochemical features and algorithmic modifications. New features are derived from Gibbs energies of amino acid-DNA interactions and hydroxyl radical cleavage profiles of DNA. Algorithmic modifications consist of inclusion of a feature selection step, use of a nonlinear kernel in the SVM classifier, and use of a consensus-based post-processing step for predictions. We also considered SVM classification based on letter features alone to distinguish performance gains from use of SVM-based models versus use of physicochemical features. The accuracy of each of the variant methods considered was assessed by cross validation using data available in the RegulonDB database for 54 Escherichia coli TFs, as well as by experimental validation using published ChIP-chip data available for Fis and Lrp.

摘要

预测转录因子结合位点（TFBS）的典型方法包括使用位置特异性权重矩阵（PWM）来统计表征已知位点的序列。最近，提出了一种替代的物理化学方法，称为 SiteSleuth。在这种方法中，线性支持向量机（SVM）分类器经过训练，可以根据 DNA 的局部化学和结构特征，将 TFBS 与背景序列区分开来。SiteSleuth 的性能似乎普遍优于基于 PWM 的方法。在这里，我们通过考虑新的物理化学特征和算法修改来改进 SiteSleuth 方法。新特征源自氨基酸-DNA 相互作用的吉布斯能和 DNA 的羟基自由基切割谱。算法修改包括包含特征选择步骤、在 SVM 分类器中使用非线性核以及使用基于共识的预测后处理步骤。我们还考虑了仅基于字母特征的 SVM 分类，以区分使用 SVM 模型与使用物理化学特征的性能提升。通过使用 RegulonDB 数据库中 54 个大肠杆菌 TF 的可用数据进行交叉验证，以及使用已发表的 Fis 和 Lrp 的 ChIP-chip 数据进行实验验证，评估了所考虑的每种变体方法的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cae/3526315/782bcd853252/gks771f1p.jpg

相似文献

Improved predictions of transcription factor binding sites using physicochemical features of DNA.

Nucleic Acids Res. 2012 Dec;40(22):e175. doi: 10.1093/nar/gks771. Epub 2012 Aug 25.

Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites.

PLoS Comput Biol. 2010 Nov 18;6(11):e1001007. doi: 10.1371/journal.pcbi.1001007.

A balancing act in transcription regulation by response regulators: titration of transcription factor activity by decoy DNA binding sites.

Nucleic Acids Res. 2021 Nov 18;49(20):11537-11549. doi: 10.1093/nar/gkab935.

MD-SVM: a novel SVM-based algorithm for the motif discovery of transcription factor binding sites.

BMC Bioinformatics. 2019 May 1;20(Suppl 7):200. doi: 10.1186/s12859-019-2735-3.

DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.

J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.

Tree-based position weight matrix approach to model transcription factor binding site profiles.

PLoS One. 2011;6(9):e24210. doi: 10.1371/journal.pone.0024210. Epub 2011 Sep 2.

Predicting transcription factor site occupancy using DNA sequence intrinsic and cell-type specific chromatin features.

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):4. doi: 10.1186/s12859-015-0846-z.

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites.

Nucleic Acids Res. 2012 Aug;40(14):e106. doi: 10.1093/nar/gks283. Epub 2012 Apr 5.

A high-order representation and classification method for transcription factor binding sites recognition in Escherichia coli.

Artif Intell Med. 2017 Jan;75:16-23. doi: 10.1016/j.artmed.2016.11.004. Epub 2016 Dec 1.

Direct and indirect effects of H-NS and Fis on global gene expression control in Escherichia coli.

Nucleic Acids Res. 2011 Mar;39(6):2073-91. doi: 10.1093/nar/gkq934. Epub 2010 Nov 21.

引用本文的文献

A deterministic code for transcription factor-DNA recognition through computation of binding interfaces.

NAR Genom Bioinform. 2022 Mar 4;4(1):lqac008. doi: 10.1093/nargab/lqac008. eCollection 2022 Mar.

Transversions have larger regulatory effects than transitions.

BMC Genomics. 2017 May 19;18(1):394. doi: 10.1186/s12864-017-3785-4.

Unveiling DNA structural features of promoters associated with various types of TSSs in prokaryotic transcriptomes and their role in gene expression.

DNA Res. 2017 Feb 1;24(1):25-35. doi: 10.1093/dnares/dsw045.

Quantitative modeling of gene expression using DNA shape features of binding sites.

Nucleic Acids Res. 2016 Jul 27;44(13):e120. doi: 10.1093/nar/gkw446. Epub 2016 Jun 1.

Knowledge-based three-body potential for transcription factor binding site prediction.

IET Syst Biol. 2016 Feb;10(1):23-9. doi: 10.1049/iet-syb.2014.0066.

An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

BMC Bioinformatics. 2015 Nov 19;16:390. doi: 10.1186/s12859-015-0819-2.

Decoding the non-coding genome: elucidating genetic risk outside the coding genome.

Genes Brain Behav. 2016 Jan;15(1):187-204. doi: 10.1111/gbb.12269. Epub 2016 Jan 4.

Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

PLoS Comput Biol. 2015 Aug 20;11(8):e1004418. doi: 10.1371/journal.pcbi.1004418. eCollection 2015 Aug.

Deconvolving the recognition of DNA shape from sequence.

Cell. 2015 Apr 9;161(2):307-18. doi: 10.1016/j.cell.2015.02.008. Epub 2015 Apr 2.

Genome-wide analysis of transcription factor binding sites and their characteristic DNA structures.

BMC Genomics. 2015;16 Suppl 3(Suppl 3):S8. doi: 10.1186/1471-2164-16-S3-S8. Epub 2015 Jan 29.

本文引用的文献

All-atom empirical potential for molecular modeling and dynamics studies of proteins.

J Phys Chem B. 1998 Apr 30;102(18):3586-616. doi: 10.1021/jp973084f.

Predicting target DNA sequences of DNA-binding proteins based on unbound structures.

PLoS One. 2012;7(2):e30446. doi: 10.1371/journal.pone.0030446. Epub 2012 Feb 1.

Direct inference of protein-DNA interactions using compressed sensing methods.

Proc Natl Acad Sci U S A. 2011 Sep 6;108(36):14819-24. doi: 10.1073/pnas.1106460108. Epub 2011 Aug 8.

A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions.

Structure. 2011 Jun 8;19(6):844-58. doi: 10.1016/j.str.2011.03.019.

Mapping and analysis of chromatin state dynamics in nine human cell types.

Nature. 2011 May 5;473(7345):43-9. doi: 10.1038/nature09906. Epub 2011 Mar 23.

Biophysics: Flipping Watson and Crick.

Nature. 2011 Feb 24;470(7335):472-3. doi: 10.1038/470472a.

Transient Hoogsteen base pairs in canonical duplex DNA.

Nature. 2011 Feb 24;470(7335):498-502. doi: 10.1038/nature09775. Epub 2011 Jan 26.

Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites.

PLoS Comput Biol. 2010 Nov 18;6(11):e1001007. doi: 10.1371/journal.pcbi.1001007.

RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units).

Nucleic Acids Res. 2011 Jan;39(Database issue):D98-105. doi: 10.1093/nar/gkq1110. Epub 2010 Nov 4.

Use of structural DNA properties for the prediction of transcription-factor binding sites in Escherichia coli.

Nucleic Acids Res. 2011 Jan;39(2):e6. doi: 10.1093/nar/gkq1071. Epub 2010 Nov 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用 DNA 的理化特征提高转录因子结合位点的预测。

Improved predictions of transcription factor binding sites using physicochemical features of DNA.

机构信息

Department of Chemistry, University of Chicago, Chicago, IL 60637, USA.

出版信息

Nucleic Acids Res. 2012 Dec;40(22):e175. doi: 10.1093/nar/gks771. Epub 2012 Aug 25.

DOI:10.1093/nar/gks771

PMID:22923524

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3526315/

Abstract

摘要

利用 DNA 的理化特征提高转录因子结合位点的预测。

Improved predictions of transcription factor binding sites using physicochemical features of DNA.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用 DNA 的理化特征提高转录因子结合位点的预测。

Improved predictions of transcription factor binding sites using physicochemical features of DNA.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献