Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0349 Oslo, Norway.
Stanford University School of Medicine, Stanford Cancer Institute, Stanford, CA 94304, USA.
Bioinformatics. 2021 Jul 12;37(11):1607-1609. doi: 10.1093/bioinformatics/btaa928.
Accurate motif enrichment analyses depend on the choice of background DNA sequences used, which should ideally match the sequence composition of the foreground sequences. It is important to avoid false positive enrichment due to sequence biases in the genome, such as GC-bias. Therefore, relying on an appropriate set of background sequences is crucial for enrichment analysis.
We developed BiasAway, a command line tool and its dedicated easy-to-use web server to generate synthetic sequences matching any k-mer nucleotide composition or select genomic DNA sequences matching the mononucleotide composition of the foreground sequences through four different models. For genomic sequences, we provide precomputed partitions of genomes from nine species with five different bin sizes to generate appropriate genomic background sequences.
BiasAway source code is freely available from Bitbucket (https://bitbucket.org/CBGR/biasaway) and can be easily installed using bioconda or pip. The web server is available at https://biasaway.uio.no and a detailed documentation is available at https://biasaway.readthedocs.io.
Supplementary data are available at Bioinformatics online.
准确的基序富集分析取决于所使用的背景 DNA 序列的选择,这些序列在理想情况下应与前景序列的序列组成相匹配。避免由于基因组中的序列偏差(如 GC 偏倚)而导致假阳性富集非常重要。因此,依赖于适当的背景序列集对于富集分析至关重要。
我们开发了 BiasAway,这是一个命令行工具及其专用的易于使用的网络服务器,可通过四种不同的模型生成与任何 k-mer 核苷酸组成匹配的合成序列,或通过选择与前景序列的单核苷酸组成匹配的基因组 DNA 序列。对于基因组序列,我们提供了来自九个物种的基因组预先计算的分区,具有五个不同的 bin 大小,以生成适当的基因组背景序列。
BiasAway 的源代码可从 Bitbucket(https://bitbucket.org/CBGR/biasaway)免费获得,并且可以使用 bioconda 或 pip 轻松安装。网络服务器可在 https://biasaway.uio.no 上使用,并且在 https://biasaway.readthedocs.io 上提供了详细的文档。
补充数据可在 Bioinformatics 在线获得。