Hasselt University, Agoralaan Gebouw D, B-3590 Diepenbeek, Belgium.
IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1344-57. doi: 10.1109/TCBB.2011.17.
Correlated motif mining (cmm) is the problem of finding overrepresented pairs of patterns, called motifs, in sequences of interacting proteins. Algorithmic solutions for cmm thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that cmm is an np-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the generic metaheuristic slider which uses steepest ascent with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that slider outperforms existing motif-driven cmm methods and scales to large protein-protein interaction networks. The slider-implementation and the data used in the experiments are available on http://bioinformatics.uhasselt.be.
关联基序挖掘(cmm)是在相互作用的蛋白质序列中寻找过代表达的模式对(称为基序)的问题。因此,cmm 的算法解决方案为蛋白质相互作用的结合位点预测提供了一种计算方法。在本文中,我们采用了一种基于基序的方法,其中候选基序对的支持在网络中进行评估。我们通过实验确定了基于卡方的支持度量相对于其他支持度量的优越性。此外,我们得出结论,对于一大类支持度量(包括卡方),cmm 是一个 np 难问题,并将相关基序的搜索重新表述为组合优化问题。然后,我们提出了通用元启发式滑块,该滑块使用基于滑动基序的最陡上升和邻域函数,并采用基于卡方的支持度量。我们表明,滑块优于现有的基于基序的 cmm 方法,并能扩展到大型蛋白质-蛋白质相互作用网络。滑块的实现和实验中使用的数据可在 http://bioinformatics.uhasselt.be 上获得。