Chen Jie, Hero Alfred O, Rajapakse Indika
CIAIC, School of Marine Science and Technology, Northwestern Polytechnical University, China Department of Electrical Engineering and Computer Science Department of Computational Medicine & Bioinformatics, Medical School.
Department of Electrical Engineering and Computer Science Department of Biomedical Engineering Department of Statistics.
Bioinformatics. 2016 Jul 15;32(14):2151-8. doi: 10.1093/bioinformatics/btw221. Epub 2016 May 5.
Topological domains have been proposed as the backbone of interphase chromosome structure. They are regions of high local contact frequency separated by sharp boundaries. Genes within a domain often have correlated transcription. In this paper, we present a computational efficient spectral algorithm to identify topological domains from chromosome conformation data (Hi-C data). We consider the genome as a weighted graph with vertices defined by loci on a chromosome and the edge weights given by interaction frequency between two loci. Laplacian-based graph segmentation is then applied iteratively to obtain the domains at the given compactness level. Comparison with algorithms in the literature shows the advantage of the proposed strategy.
An efficient algorithm is presented to identify topological domains from the Hi-C matrix.
The Matlab source code and illustrative examples are available at http://bionetworks.ccmb.med.umich.edu/
Supplementary data are available at Bioinformatics online.
拓扑结构域已被提出作为间期染色体结构的主干。它们是具有高局部接触频率的区域,由清晰的边界分隔。一个结构域内的基因通常具有相关转录。在本文中,我们提出了一种计算效率高的谱算法,用于从染色体构象数据(Hi-C数据)中识别拓扑结构域。我们将基因组视为一个加权图,其顶点由染色体上的位点定义,边权重由两个位点之间的相互作用频率给出。然后基于拉普拉斯的图分割被迭代应用,以获得给定紧致度水平下的结构域。与文献中的算法比较显示了所提出策略的优势。
提出了一种从Hi-C矩阵中识别拓扑结构域的有效算法。
Matlab源代码和示例可在http://bionetworks.ccmb.med.umich.edu/获取。
补充数据可在《生物信息学》在线获取。