PowerPlay：通过不断寻找最简单但仍未解决的问题，训练一个越来越通用的问题解决者。

PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem.

机构信息

The Swiss AI Lab IDSIA, University of Lugano , SUPSI, Lugano , Switzerland.

出版信息

Front Psychol. 2013 Jun 7;4:313. doi: 10.3389/fpsyg.2013.00313. eCollection 2013.

DOI:10.3389/fpsyg.2013.00313

PMID:23761771

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3675324/

Abstract

Most of computer science focuses on automatically solving given computational problems. I focus on automatically inventing or discovering problems in a way inspired by the playful behavior of animals and humans, to train a more and more general problem solver from scratch in an unsupervised fashion. Consider the infinite set of all computable descriptions of tasks with possibly computable solutions. Given a general problem-solving architecture, at any given time, the novel algorithmic framework PowerPlay (Schmidhuber, 2011) searches the space of possible pairs of new tasks and modifications of the current problem solver, until it finds a more powerful problem solver that provably solves all previously learned tasks plus the new one, while the unmodified predecessor does not. Newly invented tasks may require to achieve a wow-effect by making previously learned skills more efficient such that they require less time and space. New skills may (partially) re-use previously learned skills. The greedy search of typical PowerPlay variants uses time-optimal program search to order candidate pairs of tasks and solver modifications by their conditional computational (time and space) complexity, given the stored experience so far. The new task and its corresponding task-solving skill are those first found and validated. This biases the search toward pairs that can be described compactly and validated quickly. The computational costs of validating new tasks need not grow with task repertoire size. Standard problem solver architectures of personal computers or neural networks tend to generalize by solving numerous tasks outside the self-invented training set; PowerPlay's ongoing search for novelty keeps breaking the generalization abilities of its present solver. This is related to Gödel's sequence of increasingly powerful formal theories based on adding formerly unprovable statements to the axioms without affecting previously provable theorems. The continually increasing repertoire of problem-solving procedures can be exploited by a parallel search for solutions to additional externally posed tasks. PowerPlay may be viewed as a greedy but practical implementation of basic principles of creativity (Schmidhuber, 2006a, 2010). A first experimental analysis can be found in separate papers (Srivastava et al., 2012a,b, 2013).

摘要

大多数计算机科学都集中在自动解决给定的计算问题上。我专注于以受动物和人类游戏行为启发的方式自动发明或发现问题，以便以无监督的方式从 scratch 训练越来越通用的问题解决者。考虑到所有可能可计算的任务描述的无限集合，以及可能可计算的解决方案。给定一个通用的问题解决架构，在任何给定的时间，新颖的算法框架 PowerPlay（Schmidhuber，2011）都会搜索新任务和当前问题解决者的修改的可能对的空间，直到找到一个更强大的问题解决者，该解决者可以证明解决所有之前学习的任务以及新任务，而未修改的前任则不能。新发明的任务可能需要通过提高之前学习的技能的效率来达到 wow 效果，从而减少时间和空间的消耗。新技能可能（部分）重新使用之前学习的技能。典型的 PowerPlay 变体的贪婪搜索使用时间最优程序搜索，根据迄今为止存储的经验，通过它们的条件计算（时间和空间）复杂度对候选任务对和求解器修改进行排序。首先找到并验证新任务及其对应的任务求解技能。这使得搜索偏向于可以简洁描述和快速验证的任务对。验证新任务的计算成本不必随任务曲目大小的增长而增长。个人计算机或神经网络的标准问题解决者架构倾向于通过解决自我发明的训练集之外的大量任务来进行泛化；PowerPlay 对新颖性的持续搜索不断打破其当前求解器的泛化能力。这与 Gödel 基于将以前不可证明的陈述添加到公理而不影响以前可证明的定理的越来越强大的形式理论序列有关。不断增加的问题解决程序曲目可以通过并行搜索额外的外部提出的任务的解决方案来利用。PowerPlay 可以被视为对创造力基本原则的贪婪但实用的实现（Schmidhuber，2006a，2010）。第一个实验分析可以在单独的论文中找到（Srivastava 等人，2012a，b，2013）。

相似文献

PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem.PowerPlay：通过不断寻找最简单但仍未解决的问题，训练一个越来越通用的问题解决者。

Front Psychol. 2013 Jun 7;4:313. doi: 10.3389/fpsyg.2013.00313. eCollection 2013.

First experiments with POWERPLAY.首次 POWERPLAY 实验。

Neural Netw. 2013 May;41:130-6. doi: 10.1016/j.neunet.2013.01.022. Epub 2013 Feb 10.

Deep Learning Solution of the Eigenvalue Problem for Differential Operators.微分算子特征值问题的深度学习解决方案

Neural Comput. 2023 May 12;35(6):1100-1134. doi: 10.1162/neco_a_01583.

Mixing neural networks, continuation and symbolic computation to solve parametric systems of non linear equations.混合神经网络、延拓和符号计算来求解非线性方程组的参数系统。

Neural Netw. 2024 Aug;176:106316. doi: 10.1016/j.neunet.2024.106316. Epub 2024 Apr 12.

Quantifying insightful problem solving: a modified compound remote associates paradigm using lexical priming to parametrically modulate different sources of task difficulty.量化深入的问题解决能力：一种使用词汇启动来参数调节不同任务难度来源的改良复合远程联想范式。

Psychol Res. 2020 Mar;84(2):528-545. doi: 10.1007/s00426-018-1042-3. Epub 2018 Jun 27.

Progressive Interpretation Synthesis: Interpreting Task Solving by Quantifying Previously Used and Unused Information.递进式解释综合：通过量化已用和未用信息来解释任务解决

Neural Comput. 2022 Dec 14;35(1):38-57. doi: 10.1162/neco_a_01542.

Learning a Set of Interrelated Tasks by Using a Succession of Motor Policies for a Socially Guided Intrinsically Motivated Learner.通过为具有社会引导的内在动机的学习者使用一系列运动策略来学习一组相互关联的任务。

Front Neurorobot. 2019 Jan 8;12:87. doi: 10.3389/fnbot.2018.00087. eCollection 2018.

The effectiveness of internet-based e-learning on clinician behavior and patient outcomes: a systematic review protocol.基于互联网的电子学习对临床医生行为和患者结局的有效性：一项系统评价方案。

JBI Database System Rev Implement Rep. 2015 Jan;13(1):52-64. doi: 10.11124/jbisrir-2015-1919.

A Fast Binary Quadratic Programming Solver Based on Stochastic Neighborhood Search.一种基于随机邻域搜索的快速二元二次规划求解器。

IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):32-49. doi: 10.1109/TPAMI.2020.3010811. Epub 2021 Dec 7.

Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability.发现具有低柯尔莫哥洛夫复杂度和高泛化能力的神经网络。

Neural Netw. 1997 Jul;10(5):857-873. doi: 10.1016/s0893-6080(96)00127-x.

引用本文的文献

Empowerment contributes to exploration behaviour in a creative video game.赋权有助于在创造性视频游戏中探索行为。

Nat Hum Behav. 2023 Sep;7(9):1481-1489. doi: 10.1038/s41562-023-01661-2. Epub 2023 Jul 24.

General intelligence requires rethinking exploration.一般智力需要重新思考探索。

R Soc Open Sci. 2023 Jun 21;10(6):230539. doi: 10.1098/rsos.230539. eCollection 2023 Jun.

Integrated world modeling theory expanded: Implications for the future of consciousness.整合世界建模理论的扩展：对意识未来的启示。

Front Comput Neurosci. 2022 Nov 24;16:642397. doi: 10.3389/fncom.2022.642397. eCollection 2022.

The Radically Embodied Conscious Cybernetic Bayesian Brain: From Free Energy to Free Will and Back Again.彻底具身的意识控制论贝叶斯大脑：从自由能到自由意志再回归

Entropy (Basel). 2021 Jun 20;23(6):783. doi: 10.3390/e23060783.

Intrinsically Motivated Exploration of Learned Goal Spaces.对所学目标空间的内在动机探索。

Front Neurorobot. 2021 Jan 12;14:555271. doi: 10.3389/fnbot.2020.555271. eCollection 2020.

ToyArchitecture: Unsupervised learning of interpretable models of the environment.玩具架构：环境可解释模型的无监督学习。

PLoS One. 2020 May 18;15(5):e0230432. doi: 10.1371/journal.pone.0230432. eCollection 2020.

Intrinsic motivations and open-ended development in animals, humans, and robots: an overview.动物、人类和机器人的内在动机与开放式发展：综述

Front Psychol. 2014 Sep 9;5:985. doi: 10.3389/fpsyg.2014.00985. eCollection 2014.

Curiosity driven reinforcement learning for motion planning on humanoids.好奇驱动的强化学习在仿人机器人上的运动规划。

Front Neurorobot. 2014 Jan 6;7:25. doi: 10.3389/fnbot.2013.00025.

An intrinsic value system for developing multiple invariant representations with incremental slowness learning.具有增量缓慢学习功能的多不变表示的内在价值系统。

Front Neurorobot. 2013 May 30;7:9. doi: 10.3389/fnbot.2013.00009. eCollection 2013.

本文引用的文献

First experiments with POWERPLAY.首次 POWERPLAY 实验。

Neural Netw. 2013 May;41:130-6. doi: 10.1016/j.neunet.2013.01.022. Epub 2013 Feb 10.

Parameter-exploring policy gradients.参数探索策略梯度。

Neural Netw. 2010 May;23(4):551-9. doi: 10.1016/j.neunet.2009.12.004. Epub 2009 Dec 16.

A theory of human curiosity.一种关于人类好奇心的理论。

Br J Psychol. 1954 Aug;45(3):180-91. doi: 10.1111/j.2044-8295.1954.tb01243.x.

Long short-term memory.长短期记忆

Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验