Suppr超能文献

HCMB:一种用于处理高度稀疏的Hi-C接触数据归一化的稳定高效算法。

HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data.

作者信息

Wu Honglong, Wang Xuebin, Chu Mengtian, Li Dongfang, Cheng Lixin, Zhou Ke

机构信息

Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China.

BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China.

出版信息

Comput Struct Biotechnol J. 2021 Apr 27;19:2637-2645. doi: 10.1016/j.csbj.2021.04.064. eCollection 2021.

Abstract

The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB.

摘要

高通量全基因组染色体构象捕获(Hi-C)方法最近已成为研究染色体相互作用的重要工具,通过该方法可以提取有意义的生物学信息,包括P(s)曲线、拓扑相关结构域、A/B区室以及其他生物学相关信号。归一化是下游分析的关键预处理步骤,用于消除染色质接触矩阵中由于不同的可映射性、GC含量和限制性片段长度而产生的系统偏差和技术偏差。特别是,高稀疏性问题给校正提出了巨大挑战,这表明迫切需要一种稳定且高效的Hi-C数据归一化方法。最近,已经开发了一些矩阵平衡方法来对Hi-C数据进行归一化,例如Knight-Ruiz(KR)算法,但它无法对高稀疏性的接触矩阵进行归一化。在此,我们提出了一种算法,即Hi-C矩阵平衡(HCMB),它基于方程的迭代求解,并结合线性搜索和投影策略来对Hi-C原始相互作用数据进行归一化。模拟数据和实验数据均表明,HCMB在归一化Hi-C数据以及保留生物学相关的Hi-C特征方面表现稳健且高效,即使面对非常高的稀疏性也是如此。HCMB用Python实现,非商业用户可在GitHub上免费获取:https://github.com/HUST-DataMan/HCMB。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eea7/8120939/bd1a56b2ed21/ga1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验