Sun Chun-Xiao, Yang Yu, Wang Hua, Wang Wen-Hu
College of Science, Northwest A&F University, Yangling 712100, China.
School of Computer Science, Pingdingshan University, Pingdingshan 467000, China.
Entropy (Basel). 2019 Aug 16;21(8):802. doi: 10.3390/e21080802.
Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.
染色质免疫沉淀结合下一代测序(ChIP-Seq)技术能够在全基因组范围内鉴定转录因子结合位点(TFBSs)。为了有效且高效地在ChIP-Seq数据集生成的一千多个DNA序列中发现TFBSs,我们提出了一种名为AP-ChIP的新算法。首先,我们基于概率分析设置两个阈值来构建并进一步筛选聚类子集。然后,我们对候选聚类子集使用亲和传播(AP)聚类来寻找潜在基序。在模拟数据上的实验结果表明,AP-ChIP算法能够在合理时间内对TFBSs做出几乎准确的预测。此外,还在真实的ChIP-Seq数据集上测试了AP-ChIP算法的有效性。