DNA 序列+形状核函数实现了无比对的转录因子结合建模。

DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

机构信息

Department of Statistics, University of California Riverside, Riverside, CA 92521, USA.

Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA.

出版信息

Bioinformatics. 2017 Oct 1;33(19):3003-3010. doi: 10.1093/bioinformatics/btx336.

DOI:10.1093/bioinformatics/btx336

PMID:28541376

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5870879/

Abstract

MOTIVATION

Transcription factors (TFs) bind to specific DNA sequence motifs. Several lines of evidence suggest that TF-DNA binding is mediated in part by properties of the local DNA shape: the width of the minor groove, the relative orientations of adjacent base pairs, etc. Several methods have been developed to jointly account for DNA sequence and shape properties in predicting TF binding affinity. However, a limitation of these methods is that they typically require a training set of aligned TF binding sites.

RESULTS

We describe a sequence + shape kernel that leverages DNA sequence and shape information to better understand protein-DNA binding preference and affinity. This kernel extends an existing class of k-mer based sequence kernels, based on the recently described di-mismatch kernel. Using three in vitro benchmark datasets, derived from universal protein binding microarrays (uPBMs), genomic context PBMs (gcPBMs) and SELEX-seq data, we demonstrate that incorporating DNA shape information improves our ability to predict protein-DNA binding affinity. In particular, we observe that (i) the k-spectrum + shape model performs better than the classical k-spectrum kernel, particularly for small k values; (ii) the di-mismatch kernel performs better than the k-mer kernel, for larger k; and (iii) the di-mismatch + shape kernel performs better than the di-mismatch kernel for intermediate k values.

AVAILABILITY AND IMPLEMENTATION

The software is available at https://bitbucket.org/wenxiu/sequence-shape.git.

CONTACT

rohs@usc.edu or william-noble@uw.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

转录因子（TFs）与特定的 DNA 序列基序结合。有几条证据表明，TF-DNA 结合部分是由局部 DNA 形状的特性介导的：小沟的宽度、相邻碱基对的相对取向等。已经开发了几种方法来联合考虑 DNA 序列和形状特性，以预测 TF 结合亲和力。然而，这些方法的一个局限性是它们通常需要一组对齐的 TF 结合位点的训练集。

结果

我们描述了一种序列+形状核函数，该核函数利用 DNA 序列和形状信息来更好地理解蛋白质-DNA 结合偏好和亲和力。该核函数扩展了基于最近描述的二错配核的现有 k-mer 基序列核函数类。使用三个体外基准数据集，源自通用蛋白质结合微阵列（uPBMs）、基因组上下文 PBMs（gcPBMs）和 SELEX-seq 数据，我们证明了包含 DNA 形状信息可以提高我们预测蛋白质-DNA 结合亲和力的能力。特别是，我们观察到：（i）k-光谱+形状模型比经典的 k-光谱核函数表现更好，特别是对于较小的 k 值；（ii）二错配核函数比 k-mer 核函数表现更好，对于较大的 k 值；（iii）二错配+形状核函数在中间 k 值上比二错配核函数表现更好。

可用性和实现

软件可在 https://bitbucket.org/wenxiu/sequence-shape.git 获得。

联系方式

rohs@usc.edu 或 william-noble@uw.edu。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba9d/5870879/dd5c32edb8d7/btx336f1.jpg

相似文献

DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

Bioinformatics. 2017 Oct 1;33(19):3003-3010. doi: 10.1093/bioinformatics/btx336.

Quantitative modeling of transcription factor binding specificities using DNA shape.

Proc Natl Acad Sci U S A. 2015 Apr 14;112(15):4654-9. doi: 10.1073/pnas.1422023112. Epub 2015 Mar 9.

High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions.

PLoS Comput Biol. 2010 Sep 9;6(9):e1000916. doi: 10.1371/journal.pcbi.1000916.

DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter.

Bioinformatics. 2017 Apr 1;33(7):956-963. doi: 10.1093/bioinformatics/btw740.

A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.

Bioinformatics. 2015 Nov 1;31(21):3445-50. doi: 10.1093/bioinformatics/btv391. Epub 2015 Jun 30.

BEESEM: estimation of binding energy models using HT-SELEX data.

Bioinformatics. 2017 Aug 1;33(15):2288-2295. doi: 10.1093/bioinformatics/btx191.

Stability selection for regression-based models of transcription factor-DNA binding specificity.

Bioinformatics. 2013 Jul 1;29(13):i117-25. doi: 10.1093/bioinformatics/btt221.

Predicting in-vitro Transcription Factor Binding Sites Using DNA Sequence + Shape.

IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):667-676. doi: 10.1109/TCBB.2019.2947461. Epub 2021 Apr 6.

BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.

Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.

Improved linking of motifs to their TFs using domain information.

Bioinformatics. 2020 Mar 1;36(6):1655-1662. doi: 10.1093/bioinformatics/btz855.

引用本文的文献

Discovering DNA shape motifs with multiple DNA shape features: generalization, methods, and validation.

Nucleic Acids Res. 2024 May 8;52(8):4137-4150. doi: 10.1093/nar/gkae210.

Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture.

Mol Ther Nucleic Acids. 2021 Feb 18;24:154-163. doi: 10.1016/j.omtn.2021.02.014. eCollection 2021 Jun 4.

Landscape of DNA binding signatures of myocyte enhancer factor-2B reveals a unique interplay of base and shape readout.

Nucleic Acids Res. 2020 Sep 4;48(15):8529-8544. doi: 10.1093/nar/gkaa642.

Deciphering the Gene Regulatory Landscape Encoded in DNA Biophysical Features.

iScience. 2019 Nov 22;21:638-649. doi: 10.1016/j.isci.2019.10.055. Epub 2019 Oct 31.

Co-SELECT reveals sequence non-specific contribution of DNA shape to transcription factor binding in vitro.

Nucleic Acids Res. 2019 Jul 26;47(13):6632-6641. doi: 10.1093/nar/gkz540.

A De Novo Shape Motif Discovery Algorithm Reveals Preferences of Transcription Factors for DNA Shape Beyond Sequence Motifs.

Cell Syst. 2019 Jan 23;8(1):27-42.e6. doi: 10.1016/j.cels.2018.12.001. Epub 2019 Jan 16.

A comprehensive review of computational prediction of genome-wide features.

Brief Bioinform. 2020 Jan 17;21(1):120-134. doi: 10.1093/bib/bby110.

Diversification of transcription factor-DNA interactions and the evolution of gene regulatory networks.

Wiley Interdiscip Rev Syst Biol Med. 2018 Sep;10(5):e1423. doi: 10.1002/wsbm.1423. Epub 2018 Apr 25.

A unified approach for quantifying and interpreting DNA shape readout by transcription factors.

Mol Syst Biol. 2018 Feb 22;14(2):e7902. doi: 10.15252/msb.20177902.

Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding.

Nucleic Acids Res. 2017 Dec 15;45(22):12877-12887. doi: 10.1093/nar/gkx1145.

本文引用的文献

Transcription factor family-specific DNA shape readout revealed by quantitative specificity models.

Mol Syst Biol. 2017 Feb 6;13(2):910. doi: 10.15252/msb.20167238.

DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo.

Cell Syst. 2016 Sep 28;3(3):278-286.e4. doi: 10.1016/j.cels.2016.07.001. Epub 2016 Aug 18.

How motif environment influences transcription factor search dynamics: Finding a needle in a haystack.

Bioessays. 2016 Jul;38(7):605-12. doi: 10.1002/bies.201600005. Epub 2016 May 19.

DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding.

Bioinformatics. 2016 Apr 15;32(8):1211-3. doi: 10.1093/bioinformatics/btv735. Epub 2015 Dec 14.

A widespread role of the motif environment in transcription factor binding across diverse protein families.

Genome Res. 2015 Sep;25(9):1268-80. doi: 10.1101/gr.184671.114. Epub 2015 Jul 9.

Deconvolving the recognition of DNA shape from sequence.

Cell. 2015 Apr 9;161(2):307-18. doi: 10.1016/j.cell.2015.02.008. Epub 2015 Apr 2.

Quantitative modeling of transcription factor binding specificities using DNA shape.

Proc Natl Acad Sci U S A. 2015 Apr 14;112(15):4654-9. doi: 10.1073/pnas.1422023112. Epub 2015 Mar 9.

Unraveling determinants of transcription factor binding outside the core binding site.

Genome Res. 2015 Jul;25(7):1018-29. doi: 10.1101/gr.185033.114. Epub 2015 Mar 11.

Low affinity binding site clusters confer hox specificity and regulatory robustness.

Cell. 2015 Jan 15;160(1-2):191-203. doi: 10.1016/j.cell.2014.11.041. Epub 2014 Dec 31.

TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

Nucleic Acids Res. 2014 Jan;42(Database issue):D148-55. doi: 10.1093/nar/gkt1087. Epub 2013 Nov 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DNA 序列+形状核函数实现了无比对的转录因子结合建模。

DNA sequence+shape kernel enables alignment-free modeling of transcription factor binding.

机构信息

Department of Statistics, University of California Riverside, Riverside, CA 92521, USA.

Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, CA 90089, USA.