Rasmussen Simon H, Jacobsen Anders, Krogh Anders
Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen N, 2200, Denmark.
Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY, 10065, USA.
Silence. 2013 May 20;4(1):2. doi: 10.1186/1758-907X-4-2.
Post-transcriptional regulation of gene expression by small RNAs and RNA binding proteins is of fundamental importance in development of complex organisms, and dysregulation of regulatory RNAs can influence onset, progression and potentially be target for treatment of many diseases. Post-transcriptional regulation by small RNAs is mediated through partial complementary binding to messenger RNAs leaving nucleotide signatures or motifs throughout the entire transcriptome. Computational methods for discovery and analysis of sequence motifs in high-throughput mRNA expression profiling experiments are becoming increasingly important tools for the identification of post-transcriptional regulatory motifs and the inference of the regulators and their targets.
cWords is a method designed for regulatory motif discovery in differential case-control mRNA expression datasets. We have improved the algorithms and statistical methods of cWords, resulting in at least a factor 100 speed gain over the previous implementation. On a benchmark dataset of 19 microRNA (miRNA) perturbation experiments cWords showed equal or better performance than two comparable methods, miReduce and Sylamer. We have developed rigorous motif clustering and visualization that accompany the cWords analysis for more intuitive and effective data interpretation. To demonstrate the versatility of cWords we show that it can also be used for identification of potential siRNA off-target binding. Moreover, cWords analysis of an experiment profiling mRNAs bound by Argonaute ribonucleoprotein particles discovered endogenous miRNA binding motifs.
cWords is an unbiased, flexible and easy-to-use tool designed for regulatory motif discovery in differential case-control mRNA expression datasets. cWords is based on rigorous statistical methods that demonstrate comparable or better performance than other existing methods. Rich visualization of results promotes intuitive and efficient interpretation of data. cWords is available as a stand-alone Open Source program at Github https://github.com/simras/cWords and as a web-service at: http://servers.binf.ku.dk/cwords/.
小RNA和RNA结合蛋白对基因表达的转录后调控在复杂生物体的发育中至关重要,调控RNA的失调会影响多种疾病的发生、发展,并且可能成为治疗靶点。小RNA的转录后调控是通过与信使RNA的部分互补结合来介导的,这会在整个转录组中留下核苷酸特征或基序。在高通量mRNA表达谱实验中,用于发现和分析序列基序的计算方法正日益成为识别转录后调控基序以及推断调控因子及其靶点的重要工具。
cWords是一种设计用于在差异病例对照mRNA表达数据集中发现调控基序的方法。我们改进了cWords的算法和统计方法,相较于之前的版本,速度至少提升了100倍。在一个包含19个 microRNA(miRNA)扰动实验的基准数据集上,cWords的表现与另外两种类似方法miReduce和Sylamer相当或更优。我们开发了严格的基序聚类和可视化方法,伴随cWords分析,以便更直观有效地解读数据。为证明cWords的通用性,我们表明它还可用于识别潜在的siRNA脱靶结合。此外,对与AGO核糖核蛋白颗粒结合的mRNA进行实验分析时,cWords发现了内源性miRNA结合基序。
cWords是一种无偏差、灵活且易于使用的工具,设计用于在差异病例对照mRNA表达数据集中发现调控基序。cWords基于严格的统计方法,表现与其他现有方法相当或更优。丰富的结果可视化有助于直观高效地解读数据。cWords可作为独立的开源程序在Github上获取:https://github.com/simras/cWords ,也可作为网络服务在以下网址使用:http://servers.binf.ku.dk/cwords/ 。