Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, United States.
Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL 60611, United States.
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad389.
With the continuous effort to improve the quality of human reference genome and the generation of more and more personal genomes, the conversion of genomic coordinates between genome assemblies is critical in many integrative and comparative studies. While tools have been developed for such task for linear genome signals such as ChIP-Seq, no tool exists to convert genome assemblies for chromatin interaction data, despite the importance of three-dimensional genome organization in gene regulation and disease.
Here, we present HiCLift, a fast and efficient tool that can convert the genomic coordinates of chromatin contacts such as Hi-C and Micro-C from one assembly to another, including the latest T2T-CHM13 genome. Comparing with the strategy of directly remapping raw reads to a different genome, HiCLift runs on average 42 times faster (hours vs. days), while outputs nearly identical contact matrices. More importantly, as HiCLift does not need to remap the raw reads, it can directly convert human patient sample data, where the raw sequencing reads are sometimes hard to acquire or not available.
HiCLift is publicly available at https://github.com/XiaoTaoWang/HiCLift.
随着不断努力提高人类参考基因组的质量和产生越来越多的个人基因组,在许多综合和比较研究中,基因组组装之间的基因组坐标转换至关重要。虽然已经为线性基因组信号(如 ChIP-Seq)开发了此类任务的工具,但没有用于转换染色质相互作用数据的基因组组装的工具,尽管三维基因组组织在基因调控和疾病中非常重要。
在这里,我们提出了 HiCLift,这是一种快速高效的工具,可以将 Hi-C 和 Micro-C 等染色质接触的基因组坐标从一个组装转换到另一个组装,包括最新的 T2T-CHM13 基因组。与直接将原始读数重新映射到不同基因组的策略相比,HiCLift 的平均运行速度快 42 倍(小时与天相比),同时输出几乎相同的接触矩阵。更重要的是,由于 HiCLift 不需要重新映射原始读数,因此可以直接转换人类患者样本数据,其中原始测序读数有时难以获取或不可用。
HiCLift 可在 https://github.com/XiaoTaoWang/HiCLift 上公开获取。