Department of Electric Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA and Burnett School of Biomedical Science, University of Central Florida, Orlando, FL 32816, USA.
Nucleic Acids Res. 2014 Mar;42(5):e35. doi: 10.1093/nar/gkt1288. Epub 2013 Dec 9.
The identification of transcription factor binding motifs is important for the study of gene transcriptional regulation. The chromatin immunoprecipitation (ChIP), followed by massive parallel sequencing (ChIP-seq) experiments, provides an unprecedented opportunity to discover binding motifs. Computational methods have been developed to identify motifs from ChIP-seq data, while at the same time encountering several problems. For example, existing methods are often not scalable to the large number of sequences obtained from ChIP-seq peak regions. Some methods heavily rely on well-annotated motifs even though the number of known motifs is limited. To simplify the problem, de novo motif discovery methods often neglect underrepresented motifs in ChIP-seq peak regions. To address these issues, we developed a novel approach called SIOMICS to de novo discover motifs from ChIP-seq data. Tested on 13 ChIP-seq data sets, SIOMICS identified motifs of many known and new cofactors. Tested on 13 simulated random data sets, SIOMICS discovered no motif in any data set. Compared with two recently developed methods for motif discovery, SIOMICS shows advantages in terms of speed, the number of known cofactor motifs predicted in experimental data sets and the number of false motifs predicted in random data sets. The SIOMICS software is freely available at http://eecs.ucf.edu/∼xiaoman/SIOMICS/SIOMICS.html.
转录因子结合基序的鉴定对于研究基因转录调控非常重要。染色质免疫沉淀(ChIP),随后进行大规模平行测序(ChIP-seq)实验,为发现结合基序提供了前所未有的机会。已经开发了一些计算方法来从 ChIP-seq 数据中识别基序,但同时也遇到了几个问题。例如,现有的方法通常不能扩展到从 ChIP-seq 峰区获得的大量序列。一些方法严重依赖于注释良好的基序,尽管已知基序的数量有限。为了简化问题,从头发现基序的方法通常忽略了 ChIP-seq 峰区中代表性不足的基序。为了解决这些问题,我们开发了一种名为 SIOMICS 的新方法,用于从 ChIP-seq 数据中从头发现基序。在 13 个 ChIP-seq 数据集上进行测试,SIOMICS 识别出了许多已知和新的共因子的基序。在 13 个模拟随机数据集上进行测试,SIOMICS 在任何数据集中都没有发现基序。与最近开发的两种用于基序发现的方法相比,SIOMICS 在速度、在实验数据集预测的已知共因子基序数量以及在随机数据集预测的假基序数量方面具有优势。SIOMICS 软件可在 http://eecs.ucf.edu/∼xiaoman/SIOMICS/SIOMICS.html 免费获得。