基于深度学习和自一致性的无旋转异构体蛋白质序列设计

Rotamer-free protein sequence design based on deep learning and self-consistency.

作者信息

Liu Yufeng, Zhang Lu, Wang Weilun, Zhu Min, Wang Chenchen, Li Fudong, Zhang Jiahai, Li Houqiang, Chen Quan, Liu Haiyan

机构信息

MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China.

CAS Key Laboratory of GIPAS, School of Information Science and Technology, Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, Anhui, China.

出版信息

Nat Comput Sci. 2022 Jul;2(7):451-462. doi: 10.1038/s43588-022-00273-6. Epub 2022 Jul 21.

DOI:10.1038/s43588-022-00273-6

PMID:38177863

Abstract

Several previously proposed deep learning methods to design amino acid sequences that autonomously fold into a given protein backbone yielded promising results in computational tests but did not outperform conventional energy function-based methods in wet experiments. Here we present the ABACUS-R method, which uses an encoder-decoder network trained using a multitask learning strategy to predict the sidechain type of a central residue from its three-dimensional local environment, which includes, besides other features, the types but not the conformations of the surrounding sidechains. This eliminates the need to reconstruct and optimize sidechain structures, and drastically simplifies the sequence design process. Thus iteratively applying the encoder-decoder to different central residues is able to produce self-consistent overall sequences for a target backbone. Results of wet experiments, including five structures solved by X-ray crystallography, show that ABACUS-R outperforms state-of-the-art energy function-based methods in success rate and design precision.

摘要

此前提出的几种用于设计能自主折叠成给定蛋白质主链的氨基酸序列的深度学习方法，在计算测试中取得了有前景的结果，但在湿实验中并未优于传统的基于能量函数的方法。在此，我们提出了ABACUS-R方法，该方法使用一个通过多任务学习策略训练的编码器-解码器网络，从其三维局部环境预测中心残基的侧链类型，除其他特征外，该局部环境包括周围侧链的类型但不包括其构象。这消除了重建和优化侧链结构的需要，并极大地简化了序列设计过程。因此，将编码器-解码器迭代应用于不同的中心残基能够为目标主链生成自洽的整体序列。湿实验结果，包括通过X射线晶体学解析的五个结构，表明ABACUS-R在成功率和设计精度方面优于目前最先进的基于能量函数的方法。