Aroney Samuel T N, Newell Rhys J P, Nissen Jakob N, Camargo Antonio Pedro, Tyson Gene W, Woodcroft Ben J
Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba 4102, Australia.
The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen 2200, Denmark.
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf147.
Genome-centric analysis of metagenomic samples is a powerful method for understanding the function of microbial communities. Calculating read coverage is a central part of analysis, enabling differential coverage binning for recovery of genomes and estimation of microbial community composition. Coverage is determined by processing read alignments to reference sequences of either contigs or genomes. Per-reference coverage is typically calculated in an ad-hoc manner, with each software package providing its own implementation and specific definition of coverage. Here we present a unified software package CoverM which calculates several coverage statistics for contigs and genomes in an ergonomic and flexible manner. It uses "Mosdepth arrays" for computational efficiency and avoids unnecessary I/O overhead by calculating coverage statistics from streamed read alignment results.
CoverM is free software available at https://github.com/wwood/coverm. CoverM is implemented in Rust, with Python (https://github.com/apcamargo/pycoverm) and Julia (https://github.com/JuliaBinaryWrappers/CoverM_jll.jl) interfaces.
以基因组为中心的宏基因组样本分析是了解微生物群落功能的有力方法。计算读取覆盖度是分析的核心部分,可实现差异覆盖度分箱以恢复基因组并估计微生物群落组成。覆盖度通过将读取比对结果处理到重叠群或基因组的参考序列来确定。每个参考序列的覆盖度通常以临时方式计算,每个软件包都提供自己的实现方式和覆盖度的特定定义。在这里,我们展示了一个统一的软件包CoverM,它以符合人体工程学且灵活的方式计算重叠群和基因组的多个覆盖度统计信息。它使用“Mosdepth数组”以提高计算效率,并通过从流式读取比对结果计算覆盖度统计信息来避免不必要的I/O开销。