Hu Xihao, Shi Christina Huan, Yip Kevin Y
Department of Computer Science and Engineering.
Department of Computer Science and Engineering Hong Kong Bioinformatics Centre CUHK-BGI Innovation Institute of Trans-omics Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
Bioinformatics. 2016 Jun 15;32(12):i111-i120. doi: 10.1093/bioinformatics/btw256.
The three-dimensional structure of genomes makes it possible for genomic regions not adjacent in the primary sequence to be spatially proximal. These DNA contacts have been found to be related to various molecular activities. Previous methods for analyzing DNA contact maps obtained from Hi-C experiments have largely focused on studying individual interactions, forming spatial clusters composed of contiguous blocks of genomic locations, or classifying these clusters into general categories based on some global properties of the contact maps.
Here, we describe a novel computational method that can flexibly identify small clusters of spatially proximal genomic regions based on their local contact patterns. Using simulated data that highly resemble Hi-C data obtained from real genome structures, we demonstrate that our method identifies spatial clusters that are more compact than methods previously used for clustering genomic regions based on DNA contact maps. The clusters identified by our method enable us to confirm functionally related genomic regions previously reported to be spatially proximal in different species. We further show that each genomic region can be assigned a numeric affinity value that indicates its degree of participation in each local cluster, and these affinity values correlate quantitatively with DNase I hypersensitivity, gene expression, super enhancer activities and replication timing in a cell type specific manner. We also show that these cluster affinity values can precisely define boundaries of reported topologically associating domains, and further define local sub-domains within each domain.
The source code of BNMF and tutorials on how to use the software to extract local clusters from contact maps are available at http://yiplab.cse.cuhk.edu.hk/bnmf/
Supplementary data are available at Bioinformatics online.
基因组的三维结构使得在一级序列中不相邻的基因组区域在空间上能够彼此靠近。已发现这些DNA接触与各种分子活动相关。先前用于分析从Hi-C实验获得的DNA接触图谱的方法主要集中在研究个体相互作用、形成由基因组位置的连续块组成的空间簇,或者根据接触图谱的一些全局特性将这些簇分类为一般类别。
在此,我们描述了一种新颖的计算方法,该方法可以根据局部接触模式灵活地识别空间上接近的基因组区域的小簇。使用与从真实基因组结构获得的Hi-C数据高度相似的模拟数据,我们证明我们的方法识别出的空间簇比先前用于基于DNA接触图谱对基因组区域进行聚类的方法所识别的簇更紧凑。我们的方法识别出的簇使我们能够确认先前报道的在不同物种中空间上接近的功能相关基因组区域。我们进一步表明,可以为每个基因组区域分配一个数值亲和力值,该值指示其参与每个局部簇的程度,并且这些亲和力值以细胞类型特异性方式与DNase I超敏反应、基因表达、超级增强子活性和复制时间定量相关。我们还表明,这些簇亲和力值可以精确地定义报道的拓扑相关结构域的边界,并进一步定义每个结构域内的局部子结构域。
BNMF的源代码以及有关如何使用该软件从接触图谱中提取局部簇的教程可在http://yiplab.cse.cuhk.edu.hk/bnmf/获得。
补充数据可在《生物信息学》在线获取。