Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA.
Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA.
Nat Commun. 2020 Jul 9;11(1):3428. doi: 10.1038/s41467-020-17239-9.
Accurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepening our understanding of proper gene regulation. Current approaches are mainly focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of orthogonal data types such as ChIA-PET, HiChIP, Capture Hi-C, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here, we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. We compare Peakachu with current enrichment-based approaches, and find that Peakachu identifies a unique set of short-range interactions. We show that our models perform well in different platforms, across different sequencing depths, and across different species. We apply this framework to predict chromatin loops in 56 Hi-C datasets, and release the results at the 3D Genome Browser.
准确地从全基因组互作图谱(如 Hi-C 数据)中预测染色质环,对于深入了解正确的基因调控至关重要。目前的方法主要集中在搜索全基因组图谱上统计学富集的点。然而,鉴于 ChIA-PET、HiChIP、Capture Hi-C 和高通量成像等正交数据类型的可用性,监督学习方法可以促进全面的染色质相互作用的发现。在这里,我们提出了 Peakachu,这是一种从全基因组接触图谱中预测染色质环的随机森林分类框架。我们将 Peakachu 与当前基于富集的方法进行了比较,发现 Peakachu 确定了一组独特的短程相互作用。我们表明,我们的模型在不同的平台、不同的测序深度和不同的物种中表现良好。我们将这个框架应用于预测 56 个 Hi-C 数据集的染色质环,并在 3D 基因组浏览器上发布了结果。