高通量数据集的判别基序分析。

Discriminative motif analysis of high-throughput dataset.

机构信息

Human Biology Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Molecular and Cellular Biology Program, University of Washington, Seattle, Washington, 98105, USA, Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Department of Pediatrics, School of Medicine, Department of Neurology, School of Medicine, University of Washington, Seattle, Washington, 98105, USA, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Department of Computer Science and Engineering, Department of Genome Sciences, University of Washington, Seattle, Washington, 98105, USA and Bioinformatics and Computational Biology, Genentech, South San Francisco, CA 94080, USA.

出版信息

Bioinformatics. 2014 Mar 15;30(6):775-83. doi: 10.1093/bioinformatics/btt615. Epub 2013 Oct 25.

DOI:10.1093/bioinformatics/btt615

PMID:24162561

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3957073/

Abstract

MOTIVATION

High-throughput ChIP-seq studies typically identify thousands of peaks for a single transcription factor (TF). It is common for traditional motif discovery tools to predict motifs that are statistically significant against a naïve background distribution but are of questionable biological relevance.

RESULTS

We describe a simple yet effective algorithm for discovering differential motifs between two sequence datasets that is effective in eliminating systematic biases and scalable to large datasets. Tested on 207 ENCODE ChIP-seq datasets, our method identifies correct motifs in 78% of the datasets with known motifs, demonstrating improvement in both accuracy and efficiency compared with DREME, another state-of-art discriminative motif discovery tool. More interestingly, on the remaining more challenging datasets, we identify common technical or biological factors that compromise the motif search results and use advanced features of our tool to control for these factors. We also present case studies demonstrating the ability of our method to detect single base pair differences in DNA specificity of two similar TFs. Lastly, we demonstrate discovery of key TF motifs involved in tissue specification by examination of high-throughput DNase accessibility data.

AVAILABILITY

The motifRG package is publically available via the bioconductor repository.

CONTACT

yzizhen@fhcrc.org

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

高通量 ChIP-seq 研究通常为单个转录因子 (TF) 鉴定数千个峰。传统的基序发现工具通常会预测在原始背景分布下具有统计学意义的基序，但这些基序的生物学相关性值得怀疑。

结果

我们描述了一种简单而有效的算法，用于发现两个序列数据集之间的差异基序，该算法能够有效地消除系统偏差，并且可扩展到大型数据集。在 207 个 ENCODE ChIP-seq 数据集上进行测试，我们的方法在具有已知基序的 78%的数据集上正确识别基序，与另一种先进的判别基序发现工具 DREME 相比，在准确性和效率方面都有所提高。更有趣的是，对于剩下的更具挑战性的数据集，我们确定了影响基序搜索结果的常见技术或生物学因素，并利用我们工具的高级特性来控制这些因素。我们还展示了案例研究，证明了我们的方法能够检测两个相似 TF 的 DNA 特异性中的单个碱基对差异。最后，我们通过检查高通量 DNase 可及性数据，发现了参与组织特化的关键 TF 基序。

可用性

motifRG 包可通过 bioconductor 存储库公开获得。

联系方式

yzizhen@fhcrc.org

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

Discriminative motif analysis of high-throughput dataset.

Bioinformatics. 2014 Mar 15;30(6):775-83. doi: 10.1093/bioinformatics/btt615. Epub 2013 Oct 25.

DREME: motif discovery in transcription factor ChIP-seq data.

Bioinformatics. 2011 Jun 15;27(12):1653-9. doi: 10.1093/bioinformatics/btr261. Epub 2011 May 4.

BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.

Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.

MEME-ChIP: motif analysis of large DNA datasets.

Bioinformatics. 2011 Jun 15;27(12):1696-7. doi: 10.1093/bioinformatics/btr189. Epub 2011 Apr 12.

WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data.

Sci Rep. 2017 Jun 12;7(1):3217. doi: 10.1038/s41598-017-03554-7.

A general approach for discriminative de novo motif discovery from high-throughput data.

Nucleic Acids Res. 2013 Nov;41(21):e197. doi: 10.1093/nar/gkt831. Epub 2013 Sep 20.

Identification of C2H2-ZF binding preferences from ChIP-seq data using RCADE.

Bioinformatics. 2015 Sep 1;31(17):2879-81. doi: 10.1093/bioinformatics/btv284. Epub 2015 May 6.

DiffChIPL: a differential peak analysis method for high-throughput sequencing data with biological replicates based on limma.

Bioinformatics. 2022 Sep 2;38(17):4062-4069. doi: 10.1093/bioinformatics/btac498.

An Efficient Algorithm for Discovering Motifs in Large DNA Data Sets.

IEEE Trans Nanobioscience. 2015 Jul;14(5):535-44. doi: 10.1109/TNB.2015.2421340. Epub 2015 Apr 9.

Identification of transcription factor binding sites from ChIP-seq data at high resolution.

Bioinformatics. 2013 Nov 1;29(21):2705-13. doi: 10.1093/bioinformatics/btt470. Epub 2013 Aug 24.

引用本文的文献

Development and experimental validation of a machine learning-based disulfidptosis-related ferroptosis score for hepatocellular carcinoma.

Apoptosis. 2024 Feb;29(1-2):103-120. doi: 10.1007/s10495-023-01900-x. Epub 2023 Oct 24.

Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans.

BMC Genomics. 2023 Oct 7;24(1):597. doi: 10.1186/s12864-023-09692-9.

DeepD2V: A Novel Deep Learning-Based Framework for Predicting Transcription Factor Binding Sites from Combined DNA Sequence.

Int J Mol Sci. 2021 May 24;22(11):5521. doi: 10.3390/ijms22115521.

MEIRLOP: improving score-based motif enrichment by incorporating sequence bias covariates.

BMC Bioinformatics. 2020 Sep 16;21(1):410. doi: 10.1186/s12859-020-03739-4.

Hallmarks and Determinants of Oncogenic Translation Revealed by Ribosome Profiling in Models of Breast Cancer.

Transl Oncol. 2020 Feb;13(2):452-470. doi: 10.1016/j.tranon.2019.12.002. Epub 2020 Jan 3.

HOT or not: examining the basis of high-occupancy target regions.

Nucleic Acids Res. 2019 Jun 20;47(11):5735-5745. doi: 10.1093/nar/gkz460.

ProSampler: an ultrafast and accurate motif finder in large ChIP-seq datasets for combinatory motif discovery.

Bioinformatics. 2019 Nov 1;35(22):4632-4639. doi: 10.1093/bioinformatics/btz290.

FisherMP: fully parallel algorithm for detecting combinatorial motifs from large ChIP-seq datasets.

DNA Res. 2019 Jun 1;26(3):231-242. doi: 10.1093/dnares/dsz004.

SArKS: de novo discovery of gene expression regulatory motif sites and domains by suffix array kernel smoothing.

Bioinformatics. 2019 Oct 15;35(20):3944-3952. doi: 10.1093/bioinformatics/btz198.

Direct AUC optimization of regulatory motifs.

Bioinformatics. 2017 Jul 15;33(14):i243-i251. doi: 10.1093/bioinformatics/btx255.

本文引用的文献

An expansive human regulatory lexicon encoded in transcription factor footprints.

Nature. 2012 Sep 6;489(7414):83-90. doi: 10.1038/nature11212.

Inferring direct DNA binding from ChIP-seq.

Nucleic Acids Res. 2012 Sep 1;40(17):e128. doi: 10.1093/nar/gks433. Epub 2012 May 18.

Genetic and epigenetic determinants of neurogenesis and myogenesis.

Dev Cell. 2012 Apr 17;22(4):721-35. doi: 10.1016/j.devcel.2012.01.015. Epub 2012 Mar 22.

RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets.

Nucleic Acids Res. 2012 Feb;40(4):e31. doi: 10.1093/nar/gkr1104. Epub 2011 Dec 8.

Bioinformatics. 2011 Jun 15;27(12):1603-9. doi: 10.1093/bioinformatics/btr257. Epub 2011 May 4.

DREME: motif discovery in transcription factor ChIP-seq data.

Bioinformatics. 2011 Jun 15;27(12):1653-9. doi: 10.1093/bioinformatics/btr261. Epub 2011 May 4.

Chromatin accessibility pre-determines glucocorticoid receptor binding patterns.

Nat Genet. 2011 Mar;43(3):264-8. doi: 10.1038/ng.759. Epub 2011 Jan 23.

Interplay of transcription factors in T-cell differentiation and function: the role of Runx.

Immunology. 2011 Feb;132(2):157-64. doi: 10.1111/j.1365-2567.2010.03381.x. Epub 2010 Nov 23.

PeakRegressor identifies composite sequence motifs responsible for STAT1 binding sites and their potential rSNPs.

PLoS One. 2010 Aug 27;5(8):e11881. doi: 10.1371/journal.pone.0011881.

Precise temporal control of the eye regulatory gene Pax6 via enhancer-binding site affinity.

Genes Dev. 2010 May 15;24(10):980-5. doi: 10.1101/gad.1890410. Epub 2010 Apr 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高通量数据集的判别基序分析。

Discriminative motif analysis of high-throughput dataset.

机构信息

出版信息

Bioinformatics. 2014 Mar 15;30(6):775-83. doi: 10.1093/bioinformatics/btt615. Epub 2013 Oct 25.

DOI:10.1093/bioinformatics/btt615

PMID:24162561

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3957073/

Abstract

MOTIVATION

RESULTS

AVAILABILITY

The motifRG package is publically available via the bioconductor repository.

CONTACT

yzizhen@fhcrc.org

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

结果

可用性

motifRG 包可通过 bioconductor 存储库公开获得。

联系方式

yzizhen@fhcrc.org

补充信息

补充数据可在 Bioinformatics 在线获得。

高通量数据集的判别基序分析。

Discriminative motif analysis of high-throughput dataset.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

高通量数据集的判别基序分析。

Discriminative motif analysis of high-throughput dataset.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性

联系方式

补充信息