Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa214-8571, Japan.
J Chem Inf Model. 2023 Feb 13;63(3):794-805. doi: 10.1021/acs.jcim.2c01298. Epub 2023 Jan 12.
Herein, we propose a de novo direct inverse quantitative structure-property relationship/quantitative structure-activity relationship (QSPR/QSAR) analysis method, based on the chemical variational autoencoder (VAE) and Gaussian mixture regression (GMR) models, to generate molecules with the desired target variables of interest for properties and activities (). A data set of molecules was analyzed, and an encoder was used to transform the simplified molecular input line entry system (SMILES) strings to latent variables (), while a decoder was used to transform to SMILES strings. A chemical VAE model was used for analysis and a GMR model (between and ) was constructed for direct inverse analysis. The target values were input into the GMR model to directly predict the values. Following this, the predicted values were input into the decoder associated with the chemical VAE model and the SMILES string representations (or chemical structures of molecules) were obtained as the output, indicating that the proposed method could be used to selectively obtain the molecules that were characterized by the target values. We confirmed that the proposed method can be used to generate molecules within the target ranges even when the conventional chemical VAE model failed to generate the target molecules.
在此,我们提出了一种基于化学变分自动编码器(VAE)和高斯混合回归(GMR)模型的从头直接反定量构效关系/定量构性关系(QSPR/QSAR)分析方法,用于生成具有所需目标变量的分子,这些目标变量是性质和活性()。对分子数据集进行了分析,并使用编码器将简化分子输入线进入系统(SMILES)字符串转换为潜在变量(),而解码器则用于将转换为 SMILES 字符串。使用化学 VAE 模型进行分析,并构建 GMR 模型(介于和之间)用于直接反分析。将目标值输入到 GMR 模型中,以直接预测值。之后,将预测的值输入到与化学 VAE 模型相关联的解码器中,并获得 SMILES 字符串表示(或分子的化学结构)作为输出,表明该方法可用于有选择地获得具有目标值特征的分子。我们证实,即使在常规化学 VAE 模型无法生成目标分子的情况下,该方法也可用于生成目标范围内的分子。