用于比较双聚类的相似性度量

Similarity Measures for Comparing Biclusterings.

作者信息

Horta Danilo, Campello Ricardo J G B

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2014 Sep-Oct;11(5):942-54. doi: 10.1109/TCBB.2014.2325016.

DOI:10.1109/TCBB.2014.2325016

Abstract

The comparison of ordinary partitions of a set of objects is well established in the clustering literature, which comprehends several studies on the analysis of the properties of similarity measures for comparing partitions. However, similarity measures for clusterings are not readily applicable to biclusterings, since each bicluster is a tuple of two sets (of rows and columns), whereas a cluster is only a single set (of rows). Some biclustering similarity measures have been defined as minor contributions in papers which primarily report on proposals and evaluation of biclustering algorithms or comparative analyses of biclustering algorithms. The consequence is that some desirable properties of such measures have been overlooked in the literature. We review 14 biclustering similarity measures. We define eight desirable properties of a biclustering measure, discuss their importance, and prove which properties each of the reviewed measures has. We show examples drawn and inspired from important studies in which several biclustering measures convey misleading evaluations due to the absence of one or more of the discussed properties. We also advocate the use of a more general comparison approach that is based on the idea of transforming the original problem of comparing biclusterings into an equivalent problem of comparing clustering partitions with overlapping clusters.

摘要

在聚类文献中，对一组对象的普通划分进行比较已经有了充分的研究，其中包括多项关于比较划分的相似性度量属性分析的研究。然而，聚类的相似性度量并不容易应用于双聚类，因为每个双聚类是两个集合（行集合和列集合）的元组，而聚类只是单个集合（行集合）。一些双聚类相似性度量在主要报告双聚类算法的提议和评估或双聚类算法比较分析的论文中被作为次要贡献进行了定义。结果是，这些度量的一些理想属性在文献中被忽视了。我们回顾了14种双聚类相似性度量。我们定义了双聚类度量的八个理想属性，讨论了它们的重要性，并证明了每种被回顾的度量具有哪些属性。我们展示了从重要研究中提取和启发的示例，其中由于缺少一个或多个所讨论的属性，几种双聚类度量传达了误导性的评估。我们还提倡使用一种更通用的比较方法，该方法基于将比较双聚类的原始问题转化为比较具有重叠聚类的聚类划分的等效问题的思想。