Sun Jiachen, Feng Ling, Xie Jiarong, Ma Xiao, Wang Dashun, Hu Yanqing
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China.
School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, China.
Nat Commun. 2020 Jan 29;11(1):574. doi: 10.1038/s41467-020-14418-6.
Structure prediction is an important and widely studied problem in network science and machine learning, finding its applications in various fields. Despite the significant progress in prediction algorithms, the fundamental predictability of structures remains unclear, as networks' complex underlying formation dynamics are usually unobserved or difficult to describe. As such, there has been a lack of theoretical guidance on the practical development of algorithms for their absolute performances. Here, for the first time, we find that the normalized shortest compression length of a network structure can directly assess the structure predictability. Specifically, shorter binary string length from compression leads to higher structure predictability. We also analytically derive the origin of this linear relationship in artificial random networks. In addition, our finding leads to analytical results quantifying maximum prediction accuracy, and allows the estimation of the network dataset potential values through the size of the compressed network data file.
结构预测是网络科学和机器学习中一个重要且被广泛研究的问题,在各个领域都有其应用。尽管预测算法取得了显著进展,但结构的基本可预测性仍不明确,因为网络复杂的潜在形成动态通常难以观察或描述。因此,在算法实际开发以实现其绝对性能方面缺乏理论指导。在此,我们首次发现网络结构的归一化最短压缩长度可以直接评估结构可预测性。具体而言,压缩得到的二进制字符串长度越短,结构可预测性越高。我们还通过分析得出了人工随机网络中这种线性关系的起源。此外,我们的发现得出了量化最大预测准确性的分析结果,并允许通过压缩网络数据文件的大小来估计网络数据集的潜在价值。