Yu Huiyang, Shi Chunmei, He Weiming, Li Feng, Ouyang Bo
National Key Laboratory for Germplasm Innovation and Utilization of Horticultural Crops, College of Horticulture and Forestry Sciences, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, Hubei Province, China.
Key Laboratory for Vegetable Biology of Hunan Province, Engineering Research Center of Education, Ministry for Germplasm Innovation and Breeding New Varieties of Horticultural Crops, College of Horticulture, Hunan Agricultural University, No. 1 Nongda Road, Furong District, Changsha, 410128, Hunan Province, China.
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae197.
Coverage quantification is required in many sequencing datasets within the field of genomics research. However, most existing tools fail to provide comprehensive statistical results and exhibit limited performance gains from multithreading. Here, we present PanDepth, an ultra-fast and efficient tool for calculating coverage and depth from sequencing alignments. PanDepth outperforms other tools in computation time and memory efficiency for both BAM and CRAM-format alignment files from sequencing data, regardless of read length. It employs chromosome parallel computation and optimized data structures, resulting in ultrafast computation speeds and memory efficiency. It accepts sorted or unsorted BAM and CRAM-format alignment files as well as GTF, GFF and BED-formatted interval files or a specific window size. When provided with a reference genome sequence and the option to enable GC content calculation, PanDepth includes GC content statistics, enhancing the accuracy and reliability of copy number variation analysis. Overall, PanDepth is a powerful tool that accelerates scientific discovery in genomics research.
在基因组学研究领域的许多测序数据集中,都需要进行覆盖度量化。然而,大多数现有工具无法提供全面的统计结果,并且多线程带来的性能提升有限。在此,我们介绍PanDepth,这是一种用于从测序比对中计算覆盖度和深度的超快速且高效的工具。对于来自测序数据的BAM和CRAM格式比对文件,无论读长如何,PanDepth在计算时间和内存效率方面都优于其他工具。它采用染色体并行计算和优化的数据结构,从而实现超快速的计算速度和内存效率。它接受已排序或未排序的BAM和CRAM格式比对文件以及GTF、GFF和BED格式的区间文件或特定窗口大小。当提供参考基因组序列并选择启用GC含量计算时,PanDepth会包括GC含量统计信息,从而提高拷贝数变异分析的准确性和可靠性。总体而言,PanDepth是一个强大的工具,可加速基因组学研究中的科学发现。