Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
Nat Protoc. 2010 Feb;5(2):323-34. doi: 10.1038/nprot.2009.158. Epub 2010 Feb 4.
Genome-wide location analysis has become a standard technology to unravel gene regulation networks. The accurate characterization of nucleotide signatures in sequences is key to uncovering the regulatory logic but remains a computational challenge. This protocol describes how to best characterize these signatures (motifs) using the new standalone version of Trawler, which was designed and optimized to analyze chromatin immunoprecipitation (ChIP) data sets. In particular, we describe the three main steps of Trawler_standalone (motif discovery, clustering and visualization) and discuss the appropriate parameters to be used in each step depending on the data set and the biological questions addressed. Compared to five other motif discovery programs, Trawler_standalone is in most cases the fastest algorithm to accurately predict the correct motifs especially for large data sets. Its running time ranges within few seconds to several minutes, depending on the size of the data set and the parameters used. This protocol is best suited for bioinformaticians seeking to use Trawler_standalone in a high-throughput manner.
全基因组定位分析已成为揭示基因调控网络的标准技术。准确描述序列中核苷酸特征对于揭示调控逻辑至关重要,但仍然是一个计算挑战。本协议描述了如何使用新的独立版 Trawler 最好地描述这些特征(基序),Trawler 是专门设计和优化用于分析染色质免疫沉淀(ChIP)数据集的。特别是,我们描述了 Trawler_standalone 的三个主要步骤(基序发现、聚类和可视化),并讨论了根据数据集和解决的生物学问题在每个步骤中使用的适当参数。与其他五个基序发现程序相比,Trawler_standalone 通常是最快的算法,可以准确预测正确的基序,尤其是对于大型数据集。其运行时间在几秒钟到几分钟之间,具体取决于数据集的大小和使用的参数。本协议最适合希望以高通量方式使用 Trawler_standalone 的生物信息学家。