基于矩阵和 DNA 形状的判别基序优化比较。

Comparison of discriminative motif optimization using matrix and DNA shape-based models.

机构信息

Department of Genetics and Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, 63110, USA.

出版信息

BMC Bioinformatics. 2018 Mar 6;19(1):86. doi: 10.1186/s12859-018-2104-7.

DOI:10.1186/s12859-018-2104-7

PMID:29510689

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5840810/

Abstract

BACKGROUND

Transcription factor (TF) binding site specificity is commonly represented by some form of matrix model in which the positions in the binding site are assumed to contribute independently to the site's activity. The independence assumption is known to be an approximation, often a good one but sometimes poor. Alternative approaches have been developed that use k-mers (DNA "words" of length k) to account for the non-independence, and more recently DNA structural parameters have been incorporated into the models. ChIP-seq data are often used to assess the discriminatory power of motifs and to compare different models. However, to measure the improvement due to using more complex models, one must compare to optimized matrix models.

RESULTS

We describe a program "Discriminative Additive Model Optimization" (DAMO) that uses positive and negative examples, as in ChIP-seq data, and finds the additive position weight matrix (PWM) that maximizes the Area Under the Receiver Operating Characteristic Curve (AUROC). We compare to a recent study where structural parameters, serving as features in a gradient boosting classifier algorithm, are shown to improve the AUROC over JASPAR position frequency matrices (PFMs). In agreement with the previous results, we find that adding structural parameters gives the largest improvement, but most of the gain can be obtained by an optimized PWM and nearly all of the gain can be obtained with a di-nucleotide extension to the PWM.

CONCLUSION

To appropriately compare different models for TF bind sites, optimized models must be used. PWMs and their extensions are good representations of binding specificity for most TFs, and more complex models, including the incorporation of DNA shape features and gradient boosting classifiers, provide only moderate improvements for a few TFs.

摘要

背景

转录因子 (TF) 结合位点特异性通常以某种形式的矩阵模型表示，其中假定结合位点中的位置独立地对位点的活性作出贡献。已知独立性假设是一种近似，通常是很好的，但有时也很差。已经开发出了替代方法，使用 k-mers（长度为 k 的 DNA“单词”）来解释非独立性，并且最近已经将 DNA 结构参数纳入到模型中。ChIP-seq 数据通常用于评估基序的辨别能力并比较不同的模型。然而，要衡量使用更复杂的模型带来的改进，必须与优化的矩阵模型进行比较。

结果

我们描述了一个程序“Discriminative Additive Model Optimization”（DAMO），该程序使用阳性和阴性示例（如 ChIP-seq 数据），并找到可最大化接收者操作特征曲线（AUROC）下面积的加性位置权重矩阵（PWM）。我们将其与最近的一项研究进行了比较，该研究表明，结构参数作为梯度提升分类器算法中的特征，可以提高 AUROC 超过 JASPAR 位置频率矩阵（PFMs）。与之前的结果一致，我们发现添加结构参数可带来最大的改进，但通过优化 PWM 可以获得大部分增益，并且通过 PWM 的二核苷酸扩展几乎可以获得全部增益。

结论

为了适当地比较 TF 结合位点的不同模型，必须使用优化模型。PWM 及其扩展对于大多数 TF 是很好的结合特异性表示，而更复杂的模型，包括 DNA 形状特征和梯度提升分类器的纳入，仅对少数 TF 提供适度的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48bd/5840810/d577292020a7/12859_2018_2104_Fig1_HTML.jpg

相似文献

Comparison of discriminative motif optimization using matrix and DNA shape-based models.

BMC Bioinformatics. 2018 Mar 6;19(1):86. doi: 10.1186/s12859-018-2104-7.

Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data.

BMC Bioinformatics. 2015 Nov 9;16:375. doi: 10.1186/s12859-015-0797-4.

DNA Motif Databases and Their Uses.

Curr Protoc Bioinformatics. 2015 Sep 3;51:2.15.1-2.15.6. doi: 10.1002/0471250953.bi0215s51.

Tree-based position weight matrix approach to model transcription factor binding site profiles.

PLoS One. 2011;6(9):e24210. doi: 10.1371/journal.pone.0024210. Epub 2011 Sep 2.

Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data.

BMC Bioinformatics. 2015 May 1;16:140. doi: 10.1186/s12859-015-0573-5.

abc4pwm: affinity based clustering for position weight matrices in applications of DNA sequence analysis.

BMC Bioinformatics. 2022 Mar 3;23(1):83. doi: 10.1186/s12859-022-04615-z.

A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites.

Bioinformatics. 2015 Nov 1;31(21):3445-50. doi: 10.1093/bioinformatics/btv391. Epub 2015 Jun 30.

BEESEM: estimation of binding energy models using HT-SELEX data.

Bioinformatics. 2017 Aug 1;33(15):2288-2295. doi: 10.1093/bioinformatics/btx191.

Metamotifs--a generative model for building families of nucleotide position weight matrices.

BMC Bioinformatics. 2010 Jun 25;11:348. doi: 10.1186/1471-2105-11-348.

Optimizing the GATA-3 position weight matrix to improve the identification of novel binding sites.

BMC Genomics. 2012 Aug 22;13:416. doi: 10.1186/1471-2164-13-416.

引用本文的文献

TFBSFootprinter: a multiomics tool for prediction of transcription factor binding sites in vertebrate species.

Transcription. 2025 Apr-Jun;16(2-3):204-223. doi: 10.1080/21541264.2025.2521764. Epub 2025 Jul 11.

Advancing Regulatory Genomics With Machine Learning.

Bioinform Biol Insights. 2024 Dec 24;18:11779322241249562. doi: 10.1177/11779322241249562. eCollection 2024.

TFscope: systematic analysis of the sequence features involved in the binding preferences of transcription factors.

Genome Biol. 2024 Jul 10;25(1):187. doi: 10.1186/s13059-024-03321-8.

Definition of the binding specificity of the T7 bacteriophage primase by analysis of a protein binding microarray using a thermodynamic model.

Nucleic Acids Res. 2024 May 22;52(9):4818-4829. doi: 10.1093/nar/gkae215.

Discovering DNA shape motifs with multiple DNA shape features: generalization, methods, and validation.

Nucleic Acids Res. 2024 May 8;52(8):4137-4150. doi: 10.1093/nar/gkae210.

DNA binding specificity of all four Saccharomyces cerevisiae forkhead transcription factors.

Nucleic Acids Res. 2023 Jun 23;51(11):5621-5633. doi: 10.1093/nar/gkad372.

Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors.

Nat Commun. 2023 May 5;14(1):2600. doi: 10.1038/s41467-023-38096-2.

Single base-pair resolution analysis of DNA binding motif with MoMotif reveals an oncogenic function of CTCF zinc-finger 1 mutation.

Nucleic Acids Res. 2022 Aug 26;50(15):8441-8458. doi: 10.1093/nar/gkac658.

UniBind: maps of high-confidence direct TF-DNA interactions across nine species.

BMC Genomics. 2021 Jun 26;22(1):482. doi: 10.1186/s12864-021-07760-6.

Sharing DNA-binding information across structurally similar proteins enables accurate specificity determination.

Nucleic Acids Res. 2020 Jan 24;48(2):e9. doi: 10.1093/nar/gkz1087.

本文引用的文献

SelexGLM differentiates androgen and glucocorticoid receptor DNA-binding preference over an extended binding site.

Genome Res. 2018 Jan;28(1):111-121. doi: 10.1101/gr.222844.117. Epub 2017 Dec 1.

Inherent limitations of probabilistic models for protein-DNA binding specificity.

PLoS Comput Biol. 2017 Jul 7;13(7):e1005638. doi: 10.1371/journal.pcbi.1005638. eCollection 2017 Jul.

BEESEM: estimation of binding energy models using HT-SELEX data.

Bioinformatics. 2017 Aug 1;33(15):2288-2295. doi: 10.1093/bioinformatics/btx191.

DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo.

Cell Syst. 2016 Sep 28;3(3):278-286.e4. doi: 10.1016/j.cels.2016.07.001. Epub 2016 Aug 18.

Deconvolving the recognition of DNA shape from sequence.

Cell. 2015 Apr 9;161(2):307-18. doi: 10.1016/j.cell.2015.02.008. Epub 2015 Apr 2.

Quantitative modeling of transcription factor binding specificities using DNA shape.

Proc Natl Acad Sci U S A. 2015 Apr 14;112(15):4654-9. doi: 10.1073/pnas.1422023112. Epub 2015 Mar 9.

GBshape: a genome browser database for DNA shape annotations.

Nucleic Acids Res. 2015 Jan;43(Database issue):D103-9. doi: 10.1093/nar/gku977. Epub 2014 Oct 17.

Transcription factor networks in Drosophila melanogaster.

Cell Rep. 2014 Sep 25;8(6):2031-2043. doi: 10.1016/j.celrep.2014.08.038. Epub 2014 Sep 18.

Determination and inference of eukaryotic transcription factor sequence specificity.

Cell. 2014 Sep 11;158(6):1431-1443. doi: 10.1016/j.cell.2014.08.009.

Modeling the specificity of protein-DNA interactions.

Quant Biol. 2013 Jun;1(2):115-130. doi: 10.1007/s40484-013-0012-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于矩阵和 DNA 形状的判别基序优化比较。

Comparison of discriminative motif optimization using matrix and DNA shape-based models.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献