Soler-Toscano Fernando, Zenil Hector, Delahaye Jean-Paul, Gauvrit Nicolas
Grupo de Lógica, Lenguaje e Información, Universidad de Sevilla, Sevilla, Spain; Algorithmic Nature Group, LABORES, Paris, France.
Unit of Computational Medicine, Karolinska Institutet, Stockholm, Sweden; Algorithmic Nature Group, LABORES, Paris, France.
PLoS One. 2014 May 8;9(5):e96223. doi: 10.1371/journal.pone.0096223. eCollection 2014.
Drawing on various notions from theoretical computer science, we present a novel numerical approach, motivated by the notion of algorithmic probability, to the problem of approximating the Kolmogorov-Chaitin complexity of short strings. The method is an alternative to the traditional lossless compression algorithms, which it may complement, the two being serviceable for different string lengths. We provide a thorough analysis for all Σ(n=1)(11) 2(n) binary strings of length n<12 and for most strings of length 12≤n≤16 by running all ~2.5 x 10(13) Turing machines with 5 states and 2 symbols (8 x 22(9) with reduction techniques) using the most standard formalism of Turing machines, used in for example the Busy Beaver problem. We address the question of stability and error estimation, the sensitivity of the continued application of the method for wider coverage and better accuracy, and provide statistical evidence suggesting robustness. As with compression algorithms, this work promises to deliver a range of applications, and to provide insight into the question of complexity calculation of finite (and short) strings. Additional material can be found at the Algorithmic Nature Group website at http://www.algorithmicnature.org. An Online Algorithmic Complexity Calculator implementing this technique and making the data available to the research community is accessible at http://www.complexitycalculator.com.
借鉴理论计算机科学中的各种概念,我们提出了一种新颖的数值方法,该方法受算法概率概念的启发,用于解决近似短字符串的柯尔莫哥洛夫 - 柴廷复杂度问题。该方法是传统无损压缩算法的一种替代方法,二者可能相互补充,适用于不同的字符串长度。通过使用图灵机最标准的形式主义(例如在忙碌海狸问题中使用的)运行所有约2.5×10¹³个具有5个状态和2个符号的图灵机(采用约简技术后为8×2²⁹个),我们对长度n < 12的所有Σ(n = 1)(11) 2ⁿ个二进制字符串以及长度12≤n≤16的大多数字符串进行了全面分析。我们讨论了稳定性和误差估计问题,该方法持续应用以实现更广泛覆盖和更高精度时的敏感性,并提供了表明其稳健性的统计证据。与压缩算法一样,这项工作有望带来一系列应用,并为有限(和短)字符串的复杂度计算问题提供见解。更多材料可在算法自然小组网站http://www.algorithmicnature.org上找到。一个实现此技术并向研究社区提供数据的在线算法复杂度计算器可在http://www.complexitycalculator.com上访问。