Fan Xiaqiong, Fang Senlin, Li Zhengyan, Ji Hongchao, Yue Minghan, Li Jiamin, Ren Xiaozhen
School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou 450001, China.
Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China.
Int J Mol Sci. 2025 Apr 23;26(9):3980. doi: 10.3390/ijms26093980.
Recent studies have demonstrated that machine learning-based generative models can create novel molecules with desirable properties. Among them, Conditional Variational Autoencoder (CVAE) is a powerful approach to generate molecules with desired physiochemical and pharmacological properties. However, the CVAE's latent space is still a black-box, making it difficult to understand the relationship between the latent space and molecular properties. To address this issue, we propose the Interpretable Conditional Variational Autoencoder (ICVAE), which introduces a modified loss function that correlates the latent value with molecular properties. ICVAE established a linear mapping between latent variables and molecular properties. This linearity is not only crucial for improving interpretability, by assigning clear semantic meaning to latent dimensions, but also provides a practical advantage. It enables direct manipulation of molecular attributes through simple coordinate shifts in latent space, rather than relying on opaque, black-box optimization algorithms. Our experimental results show that the ICVAE can linearly relate one or multiple molecular properties with the latent value and generate molecules with precise properties by controlling the latent values. The ICVAE's interpretability allows us to gain insight into the molecular generation process, making it a promising approach in drug discovery and material design.
最近的研究表明,基于机器学习的生成模型可以创建具有理想特性的新型分子。其中,条件变分自编码器(CVAE)是一种生成具有所需物理化学和药理特性分子的强大方法。然而,CVAE的潜在空间仍然是一个黑箱,难以理解潜在空间与分子特性之间的关系。为了解决这个问题,我们提出了可解释条件变分自编码器(ICVAE),它引入了一种修改后的损失函数,将潜在值与分子特性相关联。ICVAE在潜在变量和分子特性之间建立了线性映射。这种线性不仅对于通过为潜在维度赋予清晰的语义含义来提高可解释性至关重要,而且还具有实际优势。它能够通过在潜在空间中进行简单的坐标变换直接操纵分子属性,而不是依赖不透明的黑箱优化算法。我们的实验结果表明,ICVAE可以将一个或多个分子特性与潜在值线性关联,并通过控制潜在值生成具有精确特性的分子。ICVAE的可解释性使我们能够深入了解分子生成过程,使其成为药物发现和材料设计中有前景的方法。