Odibat Omar, Reddy Chandan K
Department of Computer Science, Wayne State University, Detroit, MI, 48202.
Knowl Inf Syst. 2014 Dec;41(3):667-696. doi: 10.1007/s10115-013-0684-0.
Discriminative models are used to analyze the differences between two classes and to identify class-specific patterns. Most of the existing discriminative models depend on using the entire feature space to compute the discriminative patterns for each class. Co-clustering has been proposed to capture the patterns that are correlated in a subset of features, but it cannot handle discriminative patterns in labeled datasets. In certain biological applications such as gene expression analysis, it is critical to consider the discriminative patterns that are correlated only in a subset of the feature space. The objective of this paper is two-fold: first, it presents an algorithm to efficiently find arbitrarily positioned co-clusters from complex data. Second, it extends this co-clustering algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. In addition, we also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace pattern mining algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature.
判别模型用于分析两类之间的差异并识别特定类别的模式。大多数现有的判别模型依赖于使用整个特征空间来计算每个类别的判别模式。协同聚类已被提出用于捕获在特征子集中相关的模式,但它无法处理标记数据集中的判别模式。在某些生物应用中,如基因表达分析,考虑仅在特征空间子集中相关的判别模式至关重要。本文的目标有两个:首先,提出一种算法,用于从复杂数据中高效地找到任意位置的协同聚类。其次,通过将类信息纳入协同聚类搜索过程,扩展此协同聚类算法以发现判别协同聚类。此外,我们还对判别协同聚类进行了表征,并提出了三种新颖的度量,可用于评估任何判别子空间模式挖掘算法的性能。我们在几个合成和真实的基因表达数据集上评估了所提出的算法,实验结果表明所提出的算法优于文献中现有的几种算法。