Agostini Federico, Cirillo Davide, Ponti Riccardo Delli, Tartaglia Gian Gaetano
Gene Function and Evolution, Centre for Genomic Regulation (CRG), C/ Dr, Aiguader 88, 08003 Barcelona, Spain.
BMC Genomics. 2014 Oct 23;15(1):925. doi: 10.1186/1471-2164-15-925.
The large amount of data produced by high-throughput sequencing poses new computational challenges. In the last decade, several tools have been developed for the identification of transcription and splicing factor binding sites.
Here, we introduce the SeAMotE (Sequence Analysis of Motifs Enrichment) algorithm for discovery of regulatory regions in nucleic acid sequences. SeAMotE provides (i) a robust analysis of high-throughput sequence sets, (ii) a motif search based on pattern occurrences and (iii) an easy-to-use web-server interface. We applied our method to recently published data including 351 chromatin immunoprecipitation (ChIP) and 13 crosslinking immunoprecipitation (CLIP) experiments and compared our results with those of other well-established motif discovery tools. SeAMotE shows an average accuracy of 80% in finding discriminative motifs and outperforms other methods available in literature.
SeAMotE is a fast, accurate and flexible algorithm for the identification of sequence patterns involved in protein-DNA and protein-RNA recognition. The server can be freely accessed at http://s.tartaglialab.com/new_submission/seamote.
高通量测序产生的大量数据带来了新的计算挑战。在过去十年中,已经开发了几种用于识别转录和剪接因子结合位点的工具。
在此,我们介绍了用于发现核酸序列中调控区域的SeAMotE(基序富集序列分析)算法。SeAMotE提供了(i)对高通量序列集的稳健分析,(ii)基于模式出现情况的基序搜索,以及(iii)易于使用的网络服务器界面。我们将我们的方法应用于最近发表的数据,包括351个染色质免疫沉淀(ChIP)和13个交联免疫沉淀(CLIP)实验,并将我们的结果与其他成熟的基序发现工具的结果进行了比较。SeAMotE在发现有鉴别力的基序方面显示出平均80%的准确率,并且优于文献中可用的其他方法。
SeAMotE是一种用于识别参与蛋白质-DNA和蛋白质-RNA识别的序列模式的快速、准确且灵活的算法。该服务器可通过http://s.tartaglialab.com/new_submission/seamote免费访问。