Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK.
Sci Rep. 2018 Nov 1;8(1):16189. doi: 10.1038/s41598-018-34533-1.
The design of novel proteins has many applications but remains an attritional process with success in isolated cases. Meanwhile, deep learning technologies have exploded in popularity in recent years and are increasingly applicable to biology due to the rise in available data. We attempt to link protein design and deep learning by using variational autoencoders to generate protein sequences conditioned on desired properties. Potential copper and calcium binding sites are added to non-metal binding proteins without human intervention and compared to a hidden Markov model. In another use case, a grammar of protein structures is developed and used to produce sequences for a novel protein topology. One candidate structure is found to be stable by molecular dynamics simulation. The ability of our model to confine the vast search space of protein sequences and to scale easily has the potential to assist in a variety of protein design tasks.
新型蛋白质的设计有许多应用,但仍然是一个具有挑战性的过程,成功的案例较少。与此同时,深度学习技术近年来在生物学领域得到了广泛应用,由于可用数据的增加,它在生物学中的应用越来越多。我们试图通过使用变分自动编码器将蛋白质设计和深度学习联系起来,从而根据所需特性生成蛋白质序列。我们在非金属结合蛋白中添加了潜在的铜和钙结合位点,而无需人工干预,并与隐马尔可夫模型进行了比较。在另一个用例中,我们开发了一种蛋白质结构语法,并将其用于生成一种新型蛋白质拓扑结构的序列。通过分子动力学模拟发现,其中一个候选结构是稳定的。我们的模型能够限制蛋白质序列的巨大搜索空间并轻松扩展,这有可能辅助各种蛋白质设计任务。