Bryan Kenneth, Cunningham Pádraig, Bolshakova Nadia
The Machine Learning Group, Faculty of Computer Science, Trinity College Dublin, Ireland.
IEEE Trans Inf Technol Biomed. 2006 Jul;10(3):519-25. doi: 10.1109/titb.2006.872073.
In a gene expression data matrix, a bicluster is a submatrix of genes and conditions that exhibits a high correlation of expression activity across both rows and columns. The problem of locating the most significant bicluster has been shown to be NP-complete. Heuristic approaches such as Cheng and Church's greedy node deletion algorithm have been previously employed. It is to be expected that stochastic search techniques such as evolutionary algorithms or simulated annealing might improve upon such greedy techniques. In this paper we show that an approach based on simulated annealing is well suited to this problem, and we present a comparative evaluation of simulated annealing and node deletion on a variety of datasets. We show that simulated annealing discovers more significant biclusters in many cases. Furthermore, we also test the ability of our technique to locate biologically verifiable biclusters within an annotated set of genes.
在基因表达数据矩阵中,双聚类是基因和条件的一个子矩阵,在行和列上均表现出高表达活性相关性。已证明定位最显著双聚类的问题是NP完全问题。此前已采用启发式方法,如程和丘奇的贪婪节点删除算法。可以预期,诸如进化算法或模拟退火等随机搜索技术可能会改进此类贪婪技术。在本文中,我们表明基于模拟退火的方法非常适合该问题,并且我们在各种数据集上对模拟退火和节点删除进行了比较评估。我们表明,在许多情况下,模拟退火能发现更显著的双聚类。此外,我们还测试了我们的技术在一组注释基因中定位生物学上可验证的双聚类的能力。