Zhang Zijun, Xing Yi
Bioinformatics Interdepartmental Graduate Program, University of California, Los Angeles, Los Angeles, CA 90095, USA.
Department of Microbiology, Immunology & Molecular Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA.
Nucleic Acids Res. 2017 Sep 19;45(16):9260-9271. doi: 10.1093/nar/gkx646.
Crosslinking or RNA immunoprecipitation followed by sequencing (CLIP-seq or RIP-seq) allows transcriptome-wide discovery of RNA regulatory sites. As CLIP-seq/RIP-seq reads are short, existing computational tools focus on uniquely mapped reads, while reads mapped to multiple loci are discarded. We present CLAM (CLIP-seq Analysis of Multi-mapped reads). CLAM uses an expectation-maximization algorithm to assign multi-mapped reads and calls peaks combining uniquely and multi-mapped reads. To demonstrate the utility of CLAM, we applied it to a wide range of public CLIP-seq/RIP-seq datasets involving numerous splicing factors, microRNAs and m6A RNA methylation. CLAM recovered a large number of novel RNA regulatory sites inaccessible by uniquely mapped reads. The functional significance of these sites was demonstrated by consensus motif patterns and association with alternative splicing (splicing factors), transcript abundance (AGO2) and mRNA half-life (m6A). CLAM provides a useful tool to discover novel protein-RNA interactions and RNA modification sites from CLIP-seq and RIP-seq data, and reveals the significant contribution of repetitive elements to the RNA regulatory landscape of the human transcriptome.
交联或RNA免疫沉淀测序(CLIP-seq或RIP-seq)能够在全转录组范围内发现RNA调控位点。由于CLIP-seq/RIP-seq读段较短,现有的计算工具聚焦于唯一比对的读段,而比对到多个基因座的读段则被丢弃。我们提出了CLAM(多比对读段的CLIP-seq分析)。CLAM使用期望最大化算法来分配多比对读段,并结合唯一比对和多比对读段来调用峰。为了证明CLAM的实用性,我们将其应用于广泛的公共CLIP-seq/RIP-seq数据集,这些数据集涉及众多剪接因子、微小RNA和m6A RNA甲基化。CLAM发现了大量通过唯一比对读段无法获得的新型RNA调控位点。这些位点的功能意义通过共有基序模式以及与可变剪接(剪接因子)、转录本丰度(AGO2)和mRNA半衰期(m6A)的关联得以证明。CLAM提供了一个有用的工具,可从CLIP-seq和RIP-seq数据中发现新型蛋白质-RNA相互作用和RNA修饰位点,并揭示了重复元件对人类转录组RNA调控格局的重要贡献。