School of Software, Dalian University of Technology, Dalian, China.
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S22. doi: 10.1186/1471-2105-12-S1-S22.
Phosphorylation motifs represent common patterns around the phosphorylation site. The discovery of such kinds of motifs reveals the underlying regulation mechanism and facilitates the prediction of unknown phosphorylation event. To date, people have gathered large amounts of phosphorylation data, making it possible to perform substrate-driven motif discovery using data mining techniques.
We describe an algorithm called Motif-All that is able to efficiently identify all statistically significant motifs. The proposed method explores a support constraint to reduce search space and avoid generating random artifacts. As the number of phosphorylated peptides are far less than that of unphosphorylated ones, we divide the mining process into two stages: The first step generates candidates from the set of phosphorylated sequences using only support constraint and the second step tests the statistical significance of each candidate using the odds ratio derived from the whole data set. Experimental results on real data show that Motif-All outperforms current algorithms in terms of both effectiveness and efficiency.
Motif-All is a useful tool for discovering statistically significant phosphorylation motifs. Source codes and data sets are available at: http://bioinformatics.ust.hk/MotifAll.rar.
磷酸化模体是围绕磷酸化位点的常见模式。这类模体的发现揭示了潜在的调控机制,并有助于预测未知的磷酸化事件。迄今为止,人们已经收集了大量的磷酸化数据,使得使用数据挖掘技术进行基于底物的模体发现成为可能。
我们描述了一种名为 Motif-All 的算法,它能够有效地识别所有具有统计学意义的模体。该方法通过探索支持约束来减少搜索空间,避免产生随机伪影。由于磷酸化肽的数量远远少于非磷酸化肽的数量,我们将挖掘过程分为两个阶段:第一步仅使用支持约束从磷酸化序列集中生成候选序列,第二步使用从整个数据集推导的比值比测试每个候选序列的统计学意义。在真实数据上的实验结果表明,Motif-All 在有效性和效率方面均优于当前的算法。
Motif-All 是一种用于发现具有统计学意义的磷酸化模体的有用工具。源代码和数据集可在以下网址获取:http://bioinformatics.ust.hk/MotifAll.rar。