SignalSpider：基于多个标准化ChIP-Seq信号图谱的概率模式发现

SignalSpider: probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles.

作者信息

Wong Ka-Chun, Li Yue, Peng Chengbin, Zhang Zhaolei

机构信息

Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.

出版信息

Bioinformatics. 2015 Jan 1;31(1):17-24. doi: 10.1093/bioinformatics/btu604. Epub 2014 Sep 5.

DOI:10.1093/bioinformatics/btu604

PMID:25192742

Abstract

MOTIVATION

Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-Seq) measures the genome-wide occupancy of transcription factors in vivo. Different combinations of DNA-binding protein occupancies may result in a gene being expressed in different tissues or at different developmental stages. To fully understand the functions of genes, it is essential to develop probabilistic models on multiple ChIP-Seq profiles to decipher the combinatorial regulatory mechanisms by multiple transcription factors.

RESULTS

In this work, we describe a probabilistic model (SignalSpider) to decipher the combinatorial binding events of multiple transcription factors. Comparing with similar existing methods, we found SignalSpider performs better in clustering promoter and enhancer regions. Notably, SignalSpider can learn higher-order combinatorial patterns from multiple ChIP-Seq profiles. We have applied SignalSpider on the normalized ChIP-Seq profiles from the ENCODE consortium and learned model instances. We observed different higher-order enrichment and depletion patterns across sets of proteins. Those clustering patterns are supported by Gene Ontology (GO) enrichment, evolutionary conservation and chromatin interaction enrichment, offering biological insights for further focused studies. We also proposed a specific enrichment map visualization method to reveal the genome-wide transcription factor combinatorial patterns from the models built, which extend our existing fine-scale knowledge on gene regulation to a genome-wide level.

AVAILABILITY AND IMPLEMENTATION

The matrix-algebra-optimized executables and source codes are available at the authors' websites: http://www.cs.toronto.edu/∼wkc/SignalSpider.

摘要

动机

染色质免疫沉淀（ChIP）结合高通量测序（ChIP-Seq）可在体内测量转录因子在全基因组范围内的占有率。DNA结合蛋白占有率的不同组合可能导致一个基因在不同组织或不同发育阶段表达。为了全面了解基因的功能，开发基于多个ChIP-Seq图谱的概率模型以解读多个转录因子的组合调控机制至关重要。

结果

在这项工作中，我们描述了一种概率模型（SignalSpider）来解读多个转录因子的组合结合事件。与现有的类似方法相比，我们发现SignalSpider在聚类启动子和增强子区域方面表现更好。值得注意的是，SignalSpider可以从多个ChIP-Seq图谱中学习高阶组合模式。我们已将SignalSpider应用于来自ENCODE联盟的标准化ChIP-Seq图谱并学习了模型实例。我们观察到不同蛋白质组之间存在不同的高阶富集和缺失模式。这些聚类模式得到了基因本体论（GO）富集、进化保守性和染色质相互作用富集的支持，为进一步的重点研究提供了生物学见解。我们还提出了一种特定的富集图谱可视化方法，以从构建的模型中揭示全基因组范围内的转录因子组合模式，这将我们现有的关于基因调控的精细尺度知识扩展到了全基因组水平。

可用性和实现方式

经过矩阵代数优化的可执行文件和源代码可在作者网站获取：http://www.cs.toronto.edu/∼wkc/SignalSpider 。

相似文献

SignalSpider: probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles.

Bioinformatics. 2015 Jan 1;31(1):17-24. doi: 10.1093/bioinformatics/btu604. Epub 2014 Sep 5.

Probabilistic Inference on Multiple Normalized Signal Profiles from Next Generation Sequencing: Transcription Factor Binding Sites.

IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1416-28. doi: 10.1109/TCBB.2015.2424421.

iTAR: a web server for identifying target genes of transcription factors using ChIP-seq or ChIP-chip data.

BMC Genomics. 2016 Aug 12;17(1):632. doi: 10.1186/s12864-016-2963-0.

A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.

Bioinformatics. 2015 Jun 15;31(12):1889-96. doi: 10.1093/bioinformatics/btv094. Epub 2015 Feb 13.

Using combined evidence from replicates to evaluate ChIP-seq peaks.

Bioinformatics. 2015 Sep 1;31(17):2761-9. doi: 10.1093/bioinformatics/btv293. Epub 2015 May 7.

Seten: a tool for systematic identification and comparison of processes, phenotypes, and diseases associated with RNA-binding proteins from condition-specific CLIP-seq profiles.

RNA. 2017 Jun;23(6):836-846. doi: 10.1261/rna.059089.116. Epub 2017 Mar 23.

Chromatin analyses of Zymoseptoria tritici: Methods for chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq).

Fungal Genet Biol. 2015 Jun;79:63-70. doi: 10.1016/j.fgb.2015.03.006. Epub 2015 Apr 7.

Probabilistic Inference on Multiple Normalized Genome-Wide Signal Profiles With Model Regularization.

IEEE Trans Nanobioscience. 2017 Jan;16(1):43-50. doi: 10.1109/TNB.2016.2631406. Epub 2016 Nov 21.

ChIPulate: A comprehensive ChIP-seq simulation pipeline.

PLoS Comput Biol. 2019 Mar 21;15(3):e1006921. doi: 10.1371/journal.pcbi.1006921. eCollection 2019 Mar.

Application of topic models to a compendium of ChIP-Seq datasets uncovers recurrent transcriptional regulatory modules.

Bioinformatics. 2020 Apr 15;36(8):2352-2358. doi: 10.1093/bioinformatics/btz975.

引用本文的文献

Rescuing biologically relevant consensus regions across replicated samples.

BMC Bioinformatics. 2023 Jun 7;24(1):240. doi: 10.1186/s12859-023-05340-x.

ChIP-GSM: Inferring active transcription factor modules to predict functional regulatory elements.

PLoS Comput Biol. 2021 Jul 22;17(7):e1009203. doi: 10.1371/journal.pcbi.1009203. eCollection 2021 Jul.

Identifying Transcriptional Regulatory Modules Among Different Chromatin States in Mouse Neural Stem Cells.

Front Genet. 2019 Jan 15;9:731. doi: 10.3389/fgene.2018.00731. eCollection 2018.

Big data challenges in genome informatics.

Biophys Rev. 2019 Feb;11(1):51-54. doi: 10.1007/s12551-018-0493-5. Epub 2019 Jan 25.

Identifying peaks in *-seq data using shape information.

BMC Bioinformatics. 2016 Jun 6;17 Suppl 5(Suppl 5):206. doi: 10.1186/s12859-016-1042-5.

Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation.

Brief Bioinform. 2017 Mar 1;18(2):279-290. doi: 10.1093/bib/bbw023.

ChIP-BIT: Bayesian inference of target genes using a novel joint probabilistic model of ChIP-seq profiles.

Nucleic Acids Res. 2016 Apr 20;44(7):e65. doi: 10.1093/nar/gkv1491. Epub 2015 Dec 23.

Computational learning on specificity-determining residue-nucleotide interactions.

Nucleic Acids Res. 2015 Dec 2;43(21):10180-9. doi: 10.1093/nar/gkv1134. Epub 2015 Nov 2.

Identifying differential transcription factor binding in ChIP-seq.

Front Genet. 2015 Apr 29;6:169. doi: 10.3389/fgene.2015.00169. eCollection 2015.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SignalSpider：基于多个标准化ChIP-Seq信号图谱的概率模式发现

SignalSpider: probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现方式

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献