Darst Richard K, Nussinov Zohar, Fortunato Santo
Department of Biomedical Engineering and Computational Science, Aalto University School of Science, P.O. Box 12200, FI-00076, Finland.
Physics Department, Washington University in St. Louis, CB 1105, One Brookings Drive, St. Louis, Missouri 63130-4899, USA.
Phys Rev E Stat Nonlin Soft Matter Phys. 2014 Mar;89(3):032809. doi: 10.1103/PhysRevE.89.032809. Epub 2014 Mar 20.
Most algorithms to detect communities in networks typically work without any information on the cluster structure to be found, as one has no a priori knowledge of it, in general. Not surprisingly, knowing some features of the unknown partition could help its identification, yielding an improvement of the performance of the method. Here we show that, if the number of clusters was known beforehand, standard methods, like modularity optimization, would considerably gain in accuracy, mitigating the severe resolution bias that undermines the reliability of the results of the original unconstrained version. The number of clusters can be inferred from the spectra of the recently introduced nonbacktracking and flow matrices, even in benchmark graphs with realistic community structure. The limit of such a two-step procedure is the overhead of the computation of the spectra.
大多数用于检测网络中社区结构的算法通常在没有关于待发现的聚类结构的任何信息的情况下运行,因为一般来说人们对此没有先验知识。毫不奇怪,了解未知划分的一些特征有助于其识别,从而提高方法的性能。在这里我们表明,如果预先知道聚类的数量,像模块度优化这样的标准方法在准确性上会有显著提高,减轻严重的分辨率偏差,而这种偏差会破坏原始无约束版本结果的可靠性。即使在具有实际社区结构的基准图中,聚类的数量也可以从最近引入的非回溯矩阵和流矩阵的谱中推断出来。这种两步法的局限在于谱计算的计算开销。