Guo Xiaojie, Du Yuanqi, Tadepalli Sivani, Zhao Liang, Shehu Amarda
Department of Information Sciences and Technology, George Mason University, Fairfax, VA 22030, USA.
Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
Bioinform Adv. 2021 Nov 29;1(1):vbab036. doi: 10.1093/bioadv/vbab036. eCollection 2021.
Modeling the structural plasticity of protein molecules remains challenging. Most research has focused on obtaining one biologically active structure. This includes the recent AlphaFold2 that has been hailed as a breakthrough for protein modeling. Computing one structure does not suffice to understand how proteins modulate their interactions and even evade our immune system. Revealing the structure space available to a protein remains challenging. Data-driven approaches that learn to generate tertiary structures are increasingly garnering attention. These approaches exploit the ability to represent tertiary structures as contact or distance maps and make direct analogies with images to harness convolution-based generative adversarial frameworks from computer vision. Since such opportunistic analogies do not allow capturing highly structured data, current deep models struggle to generate physically realistic tertiary structures.
We present novel deep generative models that build upon the graph variational autoencoder framework. In contrast to existing literature, we represent tertiary structures as 'contact' graphs, which allow us to leverage graph-generative deep learning. Our models are able to capture rich, local and distal constraints and additionally compute disentangled latent representations that reveal the impact of individual latent factors. This elucidates what the factors control and makes our models more interpretable. Rigorous comparative evaluation along various metrics shows that the models, we propose advance the state-of-the-art. While there is still much ground to cover, the work presented here is an important first step, and graph-generative frameworks promise to get us to our goal of unraveling the exquisite structural complexity of protein molecules.
Code is available at https://github.com/anonymous1025/CO-VAE.
Supplementary data are available at online.
对蛋白质分子的结构可塑性进行建模仍然具有挑战性。大多数研究都集中在获得一种生物活性结构上。这包括最近被誉为蛋白质建模突破的AlphaFold2。计算一种结构不足以理解蛋白质如何调节其相互作用,甚至逃避我们的免疫系统。揭示蛋白质可用的结构空间仍然具有挑战性。学习生成三级结构的数据驱动方法越来越受到关注。这些方法利用将三级结构表示为接触图或距离图的能力,并与图像进行直接类比,以利用计算机视觉中基于卷积的生成对抗框架。由于这种机会主义类比不允许捕获高度结构化的数据,当前的深度模型难以生成物理上逼真的三级结构。
我们提出了基于图变分自编码器框架的新型深度生成模型。与现有文献不同,我们将三级结构表示为“接触”图,这使我们能够利用图生成深度学习。我们的模型能够捕获丰富的局部和远程约束,并额外计算解开的潜在表示,揭示各个潜在因素的影响。这阐明了哪些因素起控制作用,使我们的模型更具可解释性。沿各种指标进行的严格比较评估表明,我们提出的模型推动了当前技术水平的发展。虽然仍有许多工作要做,但这里介绍的工作是重要的第一步,图生成框架有望帮助我们实现解开蛋白质分子精细结构复杂性的目标。
代码可在https://github.com/anonymous1025/CO-VAE获取。
补充数据可在网上获取。