Ochiai Toshiki, Inukai Tensei, Akiyama Manato, Furui Kairi, Ohue Masahito, Matsumori Nobuaki, Inuki Shinsuke, Uesugi Motonari, Sunazuka Toshiaki, Kikuchi Kazuya, Kakeya Hideaki, Sakakibara Yasubumi
Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan.
Department of Computer Science, School of Computing, Tokyo Institute of Technology, Yokohama, Kanagawa, 226-8501, Japan.
Commun Chem. 2023 Nov 16;6(1):249. doi: 10.1038/s42004-023-01054-6.
The structural diversity of chemical libraries, which are systematic collections of compounds that have potential to bind to biomolecules, can be represented by chemical latent space. A chemical latent space is a projection of a compound structure into a mathematical space based on several molecular features, and it can express structural diversity within a compound library in order to explore a broader chemical space and generate novel compound structures for drug candidates. In this study, we developed a deep-learning method, called NP-VAE (Natural Product-oriented Variational Autoencoder), based on variational autoencoder for managing hard-to-analyze datasets from DrugBank and large molecular structures such as natural compounds with chirality, an essential factor in the 3D complexity of compounds. NP-VAE was successful in constructing the chemical latent space from large-sized compounds that were unable to be handled in existing methods, achieving higher reconstruction accuracy, and demonstrating stable performance as a generative model across various indices. Furthermore, by exploring the acquired latent space, we succeeded in comprehensively analyzing a compound library containing natural compounds and generating novel compound structures with optimized functions.
化学文库是具有与生物分子结合潜力的化合物的系统集合,其结构多样性可由化学潜在空间表示。化学潜在空间是基于多种分子特征将化合物结构投影到数学空间中,它可以表达化合物库中的结构多样性,以便探索更广阔的化学空间并生成用于候选药物的新型化合物结构。在本研究中,我们基于变分自编码器开发了一种深度学习方法,称为NP-VAE(面向天然产物的变分自编码器),用于处理来自DrugBank的难以分析的数据集以及诸如具有手性的天然化合物等大分子结构,手性是化合物三维复杂性的一个重要因素。NP-VAE成功地从现有方法无法处理的大型化合物构建了化学潜在空间,实现了更高的重建精度,并在各种指标上作为生成模型展示了稳定的性能。此外,通过探索获得的潜在空间,我们成功地全面分析了包含天然化合物的化合物库,并生成了具有优化功能的新型化合物结构。