通过简化注意力的视角来解释 Potts 和 Transformer 蛋白模型。

The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.

传统的无监督蛋白质接触预测方法使用无向图形模型来估计共进化位置。该方法在多重序列比对上训练 Potts 模型。越来越大的 Transformer 正在未标记、未对齐的蛋白质序列数据库上进行预训练，并在蛋白质接触预测方面表现出有竞争力的性能。我们认为注意力是蛋白质相互作用的一种有原则的模型，其基础是蛋白质家族数据的实际特性。我们引入了基于能量的注意力层，即因子注意力，它在一定极限下可以恢复 Potts 模型，并将其用于对比 Potts 和 Transformer。我们表明，Transformer 利用了蛋白质家族数据库中的层次信号，而这些信号是单层模型无法捕捉到的。这为开发强大的蛋白质家族数据库结构化模型提供了令人兴奋的可能性。

Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献