Turku Centre for Biotechnology, FI-20521 Turku, Finland.
BMC Genomics. 2009 Dec 18;10:618. doi: 10.1186/1471-2164-10-618.
Chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) is increasingly being applied to study transcriptional regulation on a genome-wide scale. While numerous algorithms have recently been proposed for analysing the large ChIP-seq datasets, their relative merits and potential limitations remain unclear in practical applications.
The present study compares the state-of-the-art algorithms for detecting transcription factor binding sites in four diverse ChIP-seq datasets under a variety of practical research settings. First, we demonstrate how the biological conclusions may change dramatically when the different algorithms are applied. The reproducibility across biological replicates is then investigated as an internal validation of the detections. Finally, the predicted binding sites with each method are compared to high-scoring binding motifs as well as binding regions confirmed in independent qPCR experiments.
In general, our results indicate that the optimal choice of the computational approach depends heavily on the dataset under analysis. In addition to revealing valuable information to the users of this technology about the characteristics of the binding site detection approaches, the systematic evaluation framework provides also a useful reference to the developers of improved algorithms for ChIP-seq data.
染色质免疫沉淀结合大规模平行测序(ChIP-seq)越来越多地被应用于全基因组范围内研究转录调控。虽然最近已经提出了许多用于分析大型 ChIP-seq 数据集的算法,但在实际应用中,它们的相对优点和潜在局限性仍不清楚。
本研究比较了四种不同 ChIP-seq 数据集在各种实际研究环境下检测转录因子结合位点的最新算法。首先,我们展示了当应用不同算法时,生物学结论可能会发生巨大变化。然后,通过对生物学重复的重复性进行调查,作为检测的内部验证。最后,将每种方法预测的结合位点与高得分的结合基序以及独立 qPCR 实验中证实的结合区域进行比较。
总的来说,我们的结果表明,计算方法的最佳选择在很大程度上取决于所分析的数据集。除了向该技术的使用者揭示有关结合位点检测方法的特征的有价值的信息外,系统评估框架还为 ChIP-seq 数据的改进算法的开发人员提供了有用的参考。