Intelligent Systems Program, School of Computing and Information, University of Pittsburgh, Pittsburgh, Pennsylvania 15206, United States.
Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania 15206, United States.
J Chem Inf Model. 2020 Dec 28;60(12):5647-5657. doi: 10.1021/acs.jcim.0c00681. Epub 2020 Nov 3.
Learning accurate drug representations is essential for tasks such as computational drug repositioning and prediction of drug side effects. A drug hierarchy is a valuable source that encodes knowledge of relations among drugs in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in learning drug representations has not yet been explored, and currently described drug representations cannot place novel molecules in a drug hierarchy. Here, we develop a semi-supervised drug embedding that incorporates two sources of information: (1) underlying chemical grammar that is inferred from chemical structures of drugs and drug-like molecules (unsupervised) and (2) hierarchical relations that are encoded in an expert-crafted hierarchy of approved drugs (supervised). We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the drug-drug similarity information obtained from the hierarchy to induce the clustering of drugs in hyperbolic space. The hyperbolic space is amenable for encoding hierarchical relations. Both quantitative and qualitative results support that the learned drug embedding can accurately reproduce the chemical structure and recapitulate the hierarchical relations among drugs. Furthermore, our approach can infer the pharmacological properties of novel molecules by retrieving similar drugs from the embedding space. We demonstrate that our drug embedding can predict new uses and discover new side effects of existing drugs. We show that it significantly outperforms comparison methods in both tasks.
学习准确的药物表示对于计算药物重定位和预测药物副作用等任务至关重要。药物层级结构是一种有价值的资源,它以树状结构编码了药物之间的关系知识,其中作用于相同器官、治疗相同疾病或与相同生物靶点结合的药物被分组在一起。然而,它在学习药物表示方面的应用尚未得到探索,并且目前描述的药物表示法无法将新分子置于药物层级结构中。在这里,我们开发了一种半监督药物嵌入,它结合了两种信息来源:(1)从药物和类药物分子的化学结构中推断出的基本化学语法(无监督)和(2)在专家精心制作的批准药物层级结构中编码的层次关系(监督)。我们使用变分自动编码器(VAE)框架对分子的化学结构进行编码,并使用从层次结构中获得的药物-药物相似性信息来诱导药物在双曲空间中的聚类。双曲空间适合编码层次关系。定量和定性结果都支持所学习的药物嵌入可以准确地再现化学结构,并概括药物之间的层次关系。此外,我们的方法可以通过从嵌入空间中检索相似的药物来推断新分子的药理学特性。我们证明了我们的药物嵌入可以预测现有药物的新用途和发现新的副作用。我们表明,它在这两个任务中的表现都明显优于比较方法。