Department of Epigenetics, Van Andel Institute, Grand Rapids, MI 49503, United States.
Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA 19104, United States.
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad729.
In whole genome sequencing data, polymerase chain reaction amplification results in duplicate DNA fragments coming from the same location in the genome. The process of preparing a whole genome bisulfite sequencing (WGBS) library, on the other hand, can create two DNA fragments from the same location that should not be considered duplicates. Currently, only one WGBS-aware duplicate marking tool exists. However, it only works with the output from a single tool, does not accept streaming input or output, and requires a substantial amount of memory relative to the input size. Dupsifter provides an aligner-agnostic duplicate marking tool that is lightweight, has streaming capabilities, and is memory efficient.
Source code and binaries are freely available at https://github.com/huishenlab/dupsifter under the MIT license. Dupsifter is implemented in C and is supported on macOS and Linux.
在全基因组测序数据中,聚合酶链反应(PCR)扩增会导致来自基因组中同一位置的重复 DNA 片段。另一方面,全基因组亚硫酸氢盐测序(WGBS)文库的制备过程可以从同一位置产生两个不应被视为重复的 DNA 片段。目前,仅存在一个专门用于 WGBS 的重复标记工具。然而,它仅适用于单个工具的输出,不接受流输入或输出,并且相对于输入大小需要大量的内存。Dupsifter 提供了一种与对齐器无关的重复标记工具,它轻量级、具有流处理能力且内存效率高。
源代码和二进制文件可在 MIT 许可证下在 https://github.com/huishenlab/dupsifter 上免费获得。Dupsifter 是用 C 语言实现的,支持 macOS 和 Linux。