Google Deepmind, Mountain View, CA, USA.
Computer Science Department, New York University, New York, NY, USA.
Nature. 2024 Jan;625(7995):476-482. doi: 10.1038/s41586-023-06747-5. Epub 2024 Jan 17.
Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning, owing to their reputed difficulty among the world's best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004.
在奥林匹克竞赛水平上证明数学定理是人类水平自动推理的一个重要里程碑,因为它们在大学预科数学领域中被认为是世界上最优秀人才的难题。然而,由于将人类证明转换为机器可验证格式的成本很高,当前的机器学习方法并不适用于大多数数学领域。由于其独特的翻译挑战,几何问题甚至更糟,导致训练数据严重匮乏。我们提出了 AlphaGeometry,这是一个用于欧几里得平面几何的定理证明器,它通过合成不同复杂程度的数百万个定理和证明来避免人类演示的需要。AlphaGeometry 是一个神经符号系统,它使用从我们的大规模合成数据中从头开始训练的神经语言模型来引导符号推理引擎通过具有挑战性问题中的无限分支点。在一个包含 30 个最新奥林匹克竞赛水平问题的测试集中,AlphaGeometry 解决了 25 个问题,优于之前仅解决 10 个问题的最佳方法,并接近平均国际数学奥林匹克竞赛(IMO)金牌得主的表现。值得注意的是,AlphaGeometry 生成了人类可读的证明,在人类专家评估下解决了 IMO 2000 和 2015 年的所有几何问题,并在 2004 年发现了一个翻译 IMO 定理的通用版本。