Machine Intelligence Unit, Indian Statistical Institute, 203 BT Road, Kolkata 700108, West Bengal, India.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):286-99. doi: 10.1109/TCBB.2012.103.
Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.
基因表达数据聚类是功能基因组学的重要任务之一,因为它为研究生物过程中基因的功能关系提供了有力的工具。识别共同表达的基因组代表了基因聚类问题的基本挑战。在这方面,提出了一种称为稳健粗糙模糊 c-均值的基因聚类算法,巧妙地整合了粗糙集和模糊集的优点。虽然粗糙集的下近似和上近似的概念处理了聚类定义中的不确定性、模糊性和不完整性,但模糊集的概率和可能性成员的集成能够在噪声环境中有效地处理重叠分区。在稳健粗糙模糊 c-均值中引入了簇的可能性下限和概率边界的概念,能够有效地选择基因簇。提出了一种有效的方法来选择不同基因簇的初始原型,这使得所提出的 c-均值算法能够收敛到最优或接近最优的解决方案,并有助于发现共同表达的基因簇。该算法的有效性,以及与其他算法的比较,在 14 个酵母微阵列数据集上进行了定性和定量的证明。