Kaneko Hiromasa
Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan.
ACS Omega. 2023 Jun 5;8(24):21781-21786. doi: 10.1021/acsomega.3c01332. eCollection 2023 Jun 20.
For inverse QSAR/QSPR in conventional molecular design, several chemical structures must be generated and their molecular descriptors must be calculated. However, there is no one-to-one correspondence between the generated chemical structures and molecular descriptors. In this paper, molecular descriptors, structure generation, and inverse QSAR/QSPR based on self-referencing embedded strings (SELFIES), a 100% robust molecular string representation, are proposed. A one-hot vector is converted from SELFIES to SELFIES descriptors , and an inverse analysis of the QSAR/QSPR model = () with the objective variable and molecular descriptor is conducted. Thus, values that achieve a target value are obtained. Based on these values, SELFIES strings or molecules are generated, meaning that inverse QSAR/QSPR is performed successfully. The SELFIES descriptors and SELFIES-based structure generation are verified using datasets of actual compounds. The successful construction of SELFIES-descriptor-based QSAR/QSPR models with predictive abilities comparable to those of models based on other fingerprints is confirmed. A large number of molecules with one-to-one relationships with the values of the SELFIES descriptors are generated. Furthermore, as a case study of inverse QSAR/QSPR, molecules with target values are generated successfully. The Python code for the proposed method is available at https://github.com/hkaneko1985/dcekit.
对于传统分子设计中的逆定量构效关系/定量构性关系(QSAR/QSPR),必须生成几种化学结构并计算其分子描述符。然而,生成的化学结构与分子描述符之间不存在一一对应关系。本文提出了基于自引用嵌入字符串(SELFIES,一种100%稳健的分子字符串表示)的分子描述符、结构生成以及逆QSAR/QSPR。将独热向量从SELFIES转换为SELFIES描述符,并对具有目标变量 和分子描述符 的QSAR/QSPR模型 = ()进行逆分析。由此获得实现目标 值的 值。基于这些值,生成SELFIES字符串或分子,这意味着成功执行了逆QSAR/QSPR。使用实际化合物数据集验证了SELFIES描述符和基于SELFIES的结构生成。证实了成功构建了基于SELFIES描述符的QSAR/QSPR模型,其预测能力与基于其他指纹的模型相当。生成了大量与SELFIES描述符的值具有一一对应关系的分子。此外,作为逆QSAR/QSPR的案例研究,成功生成了具有目标 值的分子。所提出方法的Python代码可在https://github.com/hkaneko1985/dcekit获取。