Guimerà Roger, Reichardt Ignasi, Aguilar-Mogas Antoni, Massucci Francesco A, Miranda Manuel, Pallarès Jordi, Sales-Pardo Marta
ICREA, Barcelona 08010, Catalonia, Spain.
Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona 43007, Catalonia, Spain.
Sci Adv. 2020 Jan 31;6(5):eaav6971. doi: 10.1126/sciadv.aav6971. eCollection 2020 Jan.
Closed-form, interpretable mathematical models have been instrumental for advancing our understanding of the world; with the data revolution, we may now be in a position to uncover new such models for many systems from physics to the social sciences. However, to deal with increasing amounts of data, we need "machine scientists" that are able to extract these models automatically from data. Here, we introduce a Bayesian machine scientist, which establishes the plausibility of models using explicit approximations to the exact marginal posterior over models and establishes its prior expectations about models by learning from a large empirical corpus of mathematical expressions. It explores the space of models using Markov chain Monte Carlo. We show that this approach uncovers accurate models for synthetic and real data and provides out-of-sample predictions that are more accurate than those of existing approaches and of other nonparametric methods.
封闭式、可解释的数学模型对于增进我们对世界的理解起到了重要作用;随着数据革命的到来,我们现在或许能够为从物理学到社会科学的许多系统发现新的此类模型。然而,为了处理日益增长的数据量,我们需要能够从数据中自动提取这些模型的“机器科学家”。在此,我们引入一种贝叶斯机器科学家,它通过对模型的精确边际后验进行显式近似来确定模型的合理性,并通过从大量数学表达式的经验语料库中学习来建立其对模型的先验期望。它使用马尔可夫链蒙特卡罗方法探索模型空间。我们表明,这种方法能够为合成数据和真实数据发现准确的模型,并提供比现有方法和其他非参数方法更准确的样本外预测。