Department of Computer Science and Engineering, University of Bologna, Italy.
Neural Netw. 2024 Nov;179:106550. doi: 10.1016/j.neunet.2024.106550. Epub 2024 Jul 17.
A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.
更好地理解最近的大型语言模型的新兴计算和解决问题的能力对于进一步改进它们和拓宽它们的适用性至关重要。这项工作研究了如何训练一种语言模型来预测下一个令牌,以便能够进行超越训练数据的一般化算术计算。二进制加法和乘法构成了这一目的的良好测试平台,因为它们只需要非常小的词汇量,并表现出相关的输入/输出不连续性,使得对于新数据来说,平滑的输入插值是无效的。我们成功地训练了一个轻量级语言模型来学习这些任务,并进行了一系列实验来研究外推能力和内部信息处理。我们的发现支持了这样一种假设,即语言模型是一种编码-回归-解码机器,其中一旦输入令牌表示被映射到适当的内部表示,计算就会在值空间中进行。