Dipartimento di Fisica, Università di Trieste, Strada Costiera 11, I-34151 Trieste, Italy.
International School for Advanced Studies (SISSA), Via Bonomea 265, I-34136 Trieste, Italy.
Phys Rev Lett. 2023 Jun 9;130(23):236401. doi: 10.1103/PhysRevLett.130.236401.
The transformer architecture has become the state-of-art model for natural language processing tasks and, more recently, also for computer vision tasks, thus defining the vision transformer (ViT) architecture. The key feature is the ability to describe long-range correlations among the elements of the input sequences, through the so-called self-attention mechanism. Here, we propose an adaptation of the ViT architecture with complex parameters to define a new class of variational neural-network states for quantum many-body systems, the ViT wave function. We apply this idea to the one-dimensional J_{1}-J_{2} Heisenberg model, demonstrating that a relatively simple parametrization gets excellent results for both gapped and gapless phases. In this case, excellent accuracies are obtained by a relatively shallow architecture, with a single layer of self-attention, thus largely simplifying the original architecture. Still, the optimization of a deeper structure is possible and can be used for more challenging models, most notably highly frustrated systems in two dimensions. The success of the ViT wave function relies on mixing both local and global operations, thus enabling the study of large systems with high accuracy.
变压器架构已成为自然语言处理任务的最新状态,也是计算机视觉任务的最新状态,从而定义了视觉变压器 (ViT) 架构。其关键特征是能够通过所谓的自注意力机制描述输入序列元素之间的远程相关性。在这里,我们提出了一种具有复杂参数的 ViT 架构的自适应方法,以定义量子多体系统的一类新的变分神经网络状态,即 ViT 波函数。我们将这一思想应用于一维 J_1-J_2 海森堡模型,证明了一种相对简单的参数化方法在有隙和无隙相都能得到很好的结果。在这种情况下,通过单层自注意力,一个相对较浅的架构就能获得极好的准确性,从而大大简化了原始架构。尽管如此,优化更深层次的结构是可能的,并且可以用于更具挑战性的模型,特别是二维中具有高度受挫的系统。ViT 波函数的成功依赖于混合局部和全局操作,从而能够以高精度研究大型系统。