Raviram Ramya, Rocha Pedro P, Müller Christian L, Miraldi Emily R, Badri Sana, Fu Yi, Swanzey Emily, Proudhon Charlotte, Snetkova Valentina, Bonneau Richard, Skok Jane A
Department of Pathology, New York University School of Medicine, New York, New York, United States of America.
Department of Biology, New York University, New York, New York, United States of America.
PLoS Comput Biol. 2016 Mar 3;12(3):e1004780. doi: 10.1371/journal.pcbi.1004780. eCollection 2016 Mar.
4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait") that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.
4C-Seq已被证明是一种强大的技术,可用于识别全基因组范围内与单个感兴趣位点(或“诱饵”)的相互作用,这对基因调控可能很重要。然而,4C-Seq数据的分析因该技术固有的许多偏差而变得复杂。处理4C-Seq数据时的一个重要考虑因素是,由于与诱饵的三维距离分离不同,全基因组信号分辨率存在差异。这导致在诱饵紧邻区域信号最高,而在远顺式和反式区域信号越来越低。4C-Seq实验的另一个重要方面是分辨率,它受限制酶的选择及其切割基因组频率的影响很大。因此,重要的是4C-Seq分析方法要足够灵活,能够分析使用不同酶生成的数据,并识别整个基因组的相互作用。当前的4C-Seq分析方法仅识别诱饵附近区域或远顺式和反式区域中的相互作用,但没有方法能全面分析不同长度尺度的4C信号。此外,一些方法在使用频繁切割限制酶产生染色质片段的实验中也会失败。在此,我们描述了4C-ker,这是一种基于隐马尔可夫模型的流程,可识别全基因组中与4C诱饵位点相互作用的区域。此外,我们纳入了在从不同基因型或实验条件收集的多个4C-Seq数据集中识别差异相互作用的方法。使用自适应窗口大小来校正诱饵附近区域、远顺式和反式染色体中信号覆盖的差异。通过几个数据集,我们证明4C-ker在使用不同分辨率酶在所有基因组范围内可重复识别相互作用结构域的能力方面优于所有现有的4C-Seq流程。
PLoS Comput Biol. 2016-3-3
Bioinformatics. 2019-12-1
J Bioinform Comput Biol. 2020-2
Methods Enzymol. 2012
PLoS Comput Biol. 2018-3-9
Bioinformatics. 2016-11-1
Nucleic Acids Res. 2014-2-20
Nat Commun. 2025-1-27
Nucleic Acids Res. 2023-9-22
Bioinformatics. 2015-10-1
Epigenomics. 2014
Bioinformatics. 2015-1-15
Cell Stem Cell. 2014-6-5