Physics Department, Faculty of Physics, Lomonosov Moscow State University, Leninskie Gory, 1-2, Moscow, Russia, 119991.
Sci Rep. 2021 Jan 11;11(1):321. doi: 10.1038/s41598-020-79682-4.
Drug discovery for a protein target is a very laborious, long and costly process. Machine learning approaches and, in particular, deep generative networks can substantially reduce development time and costs. However, the majority of methods imply prior knowledge of protein binders, their physicochemical characteristics or the three-dimensional structure of the protein. The method proposed in this work generates novel molecules with predicted ability to bind a target protein by relying on its amino acid sequence only. We consider target-specific de novo drug design as a translational problem between the amino acid "language" and simplified molecular input line entry system representation of the molecule. To tackle this problem, we apply Transformer neural network architecture, a state-of-the-art approach in sequence transduction tasks. Transformer is based on a self-attention technique, which allows the capture of long-range dependencies between items in sequence. The model generates realistic diverse compounds with structural novelty. The computed physicochemical properties and common metrics used in drug discovery fall within the plausible drug-like range of values.
药物研发是一个非常费力、漫长且昂贵的过程。机器学习方法,特别是深度生成网络,可以大大缩短开发时间和成本。然而,大多数方法都需要事先了解蛋白质配体、它们的物理化学特性或蛋白质的三维结构。本工作中提出的方法仅依靠目标蛋白的氨基酸序列来生成具有预测结合能力的新分子。我们将针对特定目标的从头药物设计视为氨基酸“语言”和分子简化分子输入行输入系统表示之间的翻译问题。为了解决这个问题,我们应用了 Transformer 神经网络架构,这是序列转换任务中的一种最先进的方法。Transformer 基于自注意力技术,可以捕获序列中项目之间的长程依赖关系。该模型生成具有结构新颖性的逼真多样的化合物。计算出的理化性质和药物发现中常用的指标都在合理的类药性范围内。