Dpt, Lenguajes y Sistemas Informáticos, ETSII, University of Seville, Avd, Reina Mercedes s/n, 41012, Seville, Spain.
BioData Min. 2011 Jan 24;4(1):3. doi: 10.1186/1756-0381-4-3.
The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes.
Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes.
The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.
微阵列技术产生的数据的分析对于理解遗传信息如何转化为有功能的基因产物非常有用。双聚类算法可以确定一组在一组实验条件下共同表达的基因。最近,已经提出了基于元启发式的新的双聚类方法。它们中的大多数使用均方残差作为优点函数,但从生物学角度来看,有趣且相关的模式(如移位和缩放模式)可能无法使用该度量来检测到。然而,发现这种类型的模式非常重要,因为通常情况下,尽管基因在不同的范围或量级上的表达水平不同,但它们的表达水平可能存在相似的行为。
散射搜索是一种基于进化的技术,它基于根据质量和多样性标准选择的一小部分解决方案的进化。本文提出了一种用于从基因表达数据中发现双聚类的散射搜索算法。在该算法中,所提出的适应度函数基于基因之间的线性相关性,以检测基因中的移位和缩放模式,并包含一种改进方法,以选择仅正相关的基因。
该算法已在三个真实数据集(酵母细胞周期数据集、人类 B 细胞淋巴瘤数据集和酵母应激数据集)上进行了测试,发现了大量具有移位和缩放模式的双聚类。此外,还使用基因本体数据库比较了所提出的方法和适应度函数与 CC、OPSM、ISA、BiMax、xMotifs 和 Samba 的性能。