Li Yuxiao, Michaud Eric J, Baek David D, Engels Joshua, Sun Xiaoqing, Tegmark Max
Beneficial AI Foundation (BAIF), Cambridge, MA 02139, USA.
Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Entropy (Basel). 2025 Mar 27;27(4):344. doi: 10.3390/e27040344.
Sparse autoencoders have recently produced dictionaries of high-dimensional vectors corresponding to the universe of concepts represented by large language models. We find that this concept universe has interesting structure at three levels: (1) The "atomic" small-scale structure contains "crystals" whose faces are parallelograms or trapezoids, generalizing well-known examples such as (). We find that the quality of such parallelograms and associated function vectors improves greatly when projecting out global distractor directions such as word length, which is efficiently performed with linear discriminant analysis. (2) The "brain" intermediate-scale structure has significant spatial modularity; for example, math and code features form a "lobe" akin to functional lobes seen in neural fMRI images. We quantify the spatial locality of these lobes with multiple metrics and find that clusters of co-occurring features, at coarse enough scale, also cluster together spatially far more than one would expect if feature geometry were random. (3) The "galaxy"-scale large-scale structure of the feature point cloud is not isotropic, but instead has a power law of eigenvalues with steepest slope in middle layers. We also quantify how the clustering entropy depends on the layer.
稀疏自动编码器最近生成了与大语言模型所代表的概念宇宙相对应的高维向量字典。我们发现这个概念宇宙在三个层面上具有有趣的结构:(1)“原子”级别的小规模结构包含“晶体”,其面为平行四边形或梯形,推广了诸如()等著名示例。我们发现,当投影出诸如单词长度等全局干扰方向时,此类平行四边形及相关函数向量的质量会大大提高,而这可以通过线性判别分析有效地实现。(2)“大脑”级别的中等规模结构具有显著的空间模块化;例如,数学和代码特征形成一个“叶”,类似于神经功能磁共振成像(fMRI)图像中看到的功能叶。我们用多种指标量化了这些叶的空间局部性,发现共同出现的特征簇,在足够粗的尺度上,在空间上聚集在一起的程度也远高于如果特征几何形状是随机的情况下人们的预期。(3)特征点云的“星系”级别的大规模结构不是各向同性的,而是在中间层具有特征值的幂律,且斜率最陡。我们还量化了聚类熵如何依赖于层。