语言模型的算术运算：从记忆到计算。

Arithmetic with language models: From memorization to computation.

机构信息

Department of Computer Science and Engineering, University of Bologna, Italy.

出版信息

Neural Netw. 2024 Nov;179:106550. doi: 10.1016/j.neunet.2024.106550. Epub 2024 Jul 17.

DOI:10.1016/j.neunet.2024.106550

Abstract

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

摘要

更好地理解最近的大型语言模型的新兴计算和解决问题的能力对于进一步改进它们和拓宽它们的适用性至关重要。这项工作研究了如何训练一种语言模型来预测下一个令牌，以便能够进行超越训练数据的一般化算术计算。二进制加法和乘法构成了这一目的的良好测试平台，因为它们只需要非常小的词汇量，并表现出相关的输入/输出不连续性，使得对于新数据来说，平滑的输入插值是无效的。我们成功地训练了一个轻量级语言模型来学习这些任务，并进行了一系列实验来研究外推能力和内部信息处理。我们的发现支持了这样一种假设，即语言模型是一种编码-回归-解码机器，其中一旦输入令牌表示被映射到适当的内部表示，计算就会在值空间中进行。

相似文献

Arithmetic with language models: From memorization to computation.语言模型的算术运算：从记忆到计算。

Neural Netw. 2024 Nov;179:106550. doi: 10.1016/j.neunet.2024.106550. Epub 2024 Jul 17.

Meta predictive learning model of languages in neural circuits.神经回路中语言的元预测学习模型。

Phys Rev E. 2024 Apr;109(4-1):044309. doi: 10.1103/PhysRevE.109.044309.

Interference and problem size effect in multiplication fact solving: Individual differences in brain activations and arithmetic performance.乘法事实解决中的干扰和问题大小效应：大脑激活和算术表现的个体差异。

Neuroimage. 2018 May 15;172:718-727. doi: 10.1016/j.neuroimage.2018.01.060. Epub 2018 Feb 11.

Language does arithmetic: linguistic differences in children's place-value processing.语言做算术：儿童数位值处理中的语言差异

Psychol Res. 2023 Feb;87(1):152-160. doi: 10.1007/s00426-022-01653-3. Epub 2022 Feb 22.

Arithmetic word problem solving: a Situation Strategy First framework.算术应用题解决：情境策略优先框架。

Dev Sci. 2010 Jan 1;13(1):92-107. doi: 10.1111/j.1467-7687.2009.00866.x.

Evaluating language models for mathematics through interactions.通过交互评估数学用语言模型。

Proc Natl Acad Sci U S A. 2024 Jun 11;121(24):e2318124121. doi: 10.1073/pnas.2318124121. Epub 2024 Jun 3.

Distinguishing word identity and sequence context in DNA language models.在 DNA 语言模型中区分单词身份和序列上下文。

BMC Bioinformatics. 2024 Sep 13;25(1):301. doi: 10.1186/s12859-024-05869-5.

Mental arithmetic in the bilingual brain: Language matters.双语大脑中的心算：语言很重要。

Neuropsychologia. 2017 Jul 1;101:17-29. doi: 10.1016/j.neuropsychologia.2017.05.009. Epub 2017 May 8.

Cognitive mechanisms underlying third graders' arithmetic skills: Expanding the pathways to mathematics model.三年级学生算术技能背后的认知机制：拓展数学模型的路径

J Exp Child Psychol. 2018 Mar;167:369-387. doi: 10.1016/j.jecp.2017.11.010. Epub 2017 Dec 9.

Prog Brain Res. 2016;227:257-76. doi: 10.1016/bs.pbr.2016.03.009. Epub 2016 Apr 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

语言模型的算术运算：从记忆到计算。

Arithmetic with language models: From memorization to computation.

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献