Uhl Michael, Tran Van Dinh, Backofen Rolf
Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany.
Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, Freiburg, 79104, Germany.
BMC Genomics. 2020 Dec 17;21(1):894. doi: 10.1186/s12864-020-07297-0.
Current peak callers for identifying RNA-binding protein (RBP) binding sites from CLIP-seq data take into account genomic read profiles, but they ignore the underlying transcript information, that is information regarding splicing events. So far, there are no studies available that closer observe this issue.
Here we show that current peak callers are susceptible to false peak calling near exon borders. We quantify its extent in publicly available datasets, which turns out to be substantial. By providing a tool called CLIPcontext for automatic transcript and genomic context sequence extraction, we further demonstrate that context choice affects the performances of RBP binding site prediction tools. Moreover, we show that known motifs of exon-binding RBPs are often enriched in transcript context sites, which should enable the recovery of more authentic binding sites. Finally, we discuss possible strategies on how to integrate transcript information into future workflows.
Our results demonstrate the importance of incorporating transcript information in CLIP-seq data analysis. Taking advantage of the underlying transcript information should therefore become an integral part of future peak calling and downstream analysis tools.
目前用于从CLIP-seq数据中识别RNA结合蛋白(RBP)结合位点的峰检测工具考虑了基因组读取概况,但忽略了潜在的转录本信息,即有关剪接事件的信息。到目前为止,尚无研究更深入地观察这一问题。
我们在此表明,当前的峰检测工具易在外显子边界附近误判峰。我们在公开可用的数据集中量化了其程度,结果表明这一程度相当可观。通过提供一个名为CLIPcontext的工具来自动提取转录本和基因组上下文序列,我们进一步证明上下文选择会影响RBP结合位点预测工具的性能。此外,我们表明外显子结合RBP的已知基序通常在转录本上下文位点中富集,这应该能够恢复更多真实的结合位点。最后,我们讨论了如何将转录本信息整合到未来工作流程中的可能策略。
我们的结果证明了在CLIP-seq数据分析中纳入转录本信息的重要性。因此,利用潜在的转录本信息应成为未来峰检测和下游分析工具的一个组成部分。