Reichardt Jörg, Leone Michele
Institute for Theoretical Physics, University of Würzburg, 97074 Würzburg, Germany.
Phys Rev Lett. 2008 Aug 15;101(7):078701. doi: 10.1103/PhysRevLett.101.078701. Epub 2008 Aug 13.
Can a cluster structure in a sparse relational data set, i.e., a network, be detected at all by unsupervised clustering techniques? We answer this question by means of statistical mechanics making our analysis independent of any particular algorithm used for clustering. We find a sharp transition from a phase in which the cluster structure is not detectable at all to a phase in which it can be detected with high accuracy. We calculate the transition point and the shape of the transition, i.e., the theoretically achievable accuracy, analytically. This illuminates theoretical limitations of data mining in networks and allows for an understanding and evaluation of the performance of a variety of algorithms.
在稀疏关系数据集(即网络)中的聚类结构能否通过无监督聚类技术被检测出来?我们借助统计力学来回答这个问题,使我们的分析独立于用于聚类的任何特定算法。我们发现从一个完全检测不到聚类结构的阶段到一个可以高精度检测到聚类结构的阶段存在一个明显的转变。我们通过分析计算出转变点和转变的形状,即理论上可达到的精度。这阐明了网络中数据挖掘的理论局限性,并有助于理解和评估各种算法的性能。