Abdennur Nezar, Fudenberg Geoffrey, Flyamer Ilya M, Galitsyna Aleksandra A, Goloborodko Anton, Imakaev Maxim, Venev Sergey V
Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, 01605, MA.
Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA, 01605, USA.
bioRxiv. 2023 Feb 15:2023.02.13.528389. doi: 10.1101/2023.02.13.528389.
The field of 3D genome organization produces large amounts of sequencing data from Hi-C and a rapidly-expanding set of other chromosome conformation protocols (3C+). Massive and heterogeneous 3C+ data require high-performance and flexible processing of sequenced reads into contact pairs. To meet these challenges, we present - a flexible suite of tools for contact extraction from sequencing data. provides modular command-line interface (CLI) tools that can be flexibly chained into data processing pipelines. provides both crucial core tools as well as auxiliary tools for building feature-rich 3C+ pipelines, including contact pair manipulation, filtration, and quality control. Benchmarking against popular 3C+ data pipelines shows advantages of for high-performance and flexible 3C+ analysis. Finally, provides protocol-specific tools for multi-way contacts, haplotype-resolved contacts, and single-cell Hi-C. The combination of CLI tools and tight integration with Python data analysis libraries makes a versatile foundation for a broad range of 3C+ pipelines.
三维基因组组织领域从Hi-C以及一系列快速扩展的其他染色体构象分析方法(3C+)中产生了大量测序数据。海量且异质的3C+数据需要将测序读段高效灵活地处理为接触对。为应对这些挑战,我们推出了——一套用于从测序数据中提取接触对的灵活工具集。它提供了模块化的命令行界面(CLI)工具,这些工具可以灵活地链接到数据处理管道中。它既提供了关键的核心工具,也提供了用于构建功能丰富的3C+管道的辅助工具,包括接触对操作、过滤和质量控制。将其与流行的3C+数据管道进行基准测试表明,它在高性能和灵活的3C+分析方面具有优势。最后,它还提供了针对多路接触、单倍型解析接触和单细胞Hi-C的特定方法工具。CLI工具与Python数据分析库的紧密集成,使其成为构建广泛的3C+管道的通用基础。