Raditsa Vladimir V, Tsukanov Anton V, Bogomolov Anton G, Levitsky Victor G
Department of System Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.
Department of Cell Biology, Institute of Cytology and Genetics, Novosibirsk 630090, Russia.
NAR Genom Bioinform. 2024 Jul 27;6(3):lqae090. doi: 10.1093/nargab/lqae090. eCollection 2024 Sep.
Efficient motif discovery from the results of wide-genome mapping of transcription factor binding sites (ChIP-seq) is dependent on the choice of background nucleotide sequences. The foreground sequences (ChIP-seq peaks) represent not only specific motifs of target transcription factors, but also the motifs overrepresented throughout the genome, such as simple sequence repeats. We performed a massive comparison of the 'synthetic' and 'genomic' approaches to generate background sequences for motif discovery. The 'synthetic' approach shuffled nucleotides in peaks, while in the 'genomic' approach selected sequences from the reference genome randomly or only from gene promoters according to the fraction of A/T nucleotides in each sequence. We compiled the benchmark collections of ChIP-seq datasets for mouse, human and Arabidopsis, and performed motif discovery. We showed that the genomic approach has both more robust detection of the known motifs of target transcription factors and more stringent exclusion of the simple sequence repeats as possible non-specific motifs. The advantage of the genomic approach over the synthetic approach was greater in plants compared to mammals. We developed the AntiNoise web service (https://denovosea.icgbio.ru/antinoise/) that implements a genomic approach to extract genomic background sequences for twelve eukaryotic genomes.
从转录因子结合位点的全基因组图谱(ChIP-seq)结果中高效发现基序,取决于背景核苷酸序列的选择。前景序列(ChIP-seq峰)不仅代表目标转录因子的特定基序,还代表全基因组中过度富集的基序,如简单序列重复。我们对用于基序发现的“合成”和“基因组”方法生成背景序列进行了大规模比较。“合成”方法对峰中的核苷酸进行洗牌,而“基因组”方法则根据每个序列中A/T核苷酸的比例从参考基因组中随机选择序列,或仅从基因启动子中选择序列。我们编制了小鼠、人类和拟南芥的ChIP-seq数据集基准集合,并进行了基序发现。我们表明,基因组方法在检测目标转录因子的已知基序方面更稳健,并且在排除可能的非特异性基序——简单序列重复方面更严格。与哺乳动物相比,基因组方法相对于合成方法的优势在植物中更大。我们开发了AntiNoise网络服务(https://denovosea.icgbio.ru/antinoise/),该服务实现了一种基因组方法,可为十二个真核生物基因组提取基因组背景序列。