Yamauchi Keitaro, Nakatsuji Hirotaka, Kamishima Takaaki, Koseki Yoshitaka, Kubo Masaki, Kasai Hitoshi
Institute of Multidisciplinary Research for Advance Materials (IMRAM), Tohoku University, Aoba-Ku, Sendai, Miyagi, 980-8577, Japan.
East Tokyo Laboratory, Genesis Research Institute, Inc., 717-86 Futamata, Ichikawa, Chiba, 272-0001, Japan.
Sci Rep. 2024 Feb 19;14(1):4106. doi: 10.1038/s41598-024-53888-2.
The utilization of machine learning has a potential to improve the environment of the development of antimicrobial agents. For practical use of machine learning, it is important that the conversion of molecules information to an appropriate descriptor because too informative descriptor requires enormous computation time and experiments for gathering data, whereas a less informative descriptor has problems in validity. In this study, we utilized a descriptor only focused on substituent. The type and the position of substituents on the molecules that have a 4-quinolone structure (11,879 compounds) were converted to the combined substituent number (CSN). While the CSN does not include information on the detailed structure, physical properties, and quantum chemistry of molecules, the prediction model constructed by machine learning of CSN indicated a sufficient coefficient of determination (0.719 for the training dataset and 0.519 for the validation dataset). In addition, this CSN can easily construct the unknown molecules library which has a relatively consistent structure by recombination of substituents (32,079,318 compounds) and screening of them. The validity of the prediction model was also confirmed by growth inhibition experiments for E. coli using the model-suggested molecules and commercially available antimicrobial agents.
机器学习的应用有可能改善抗菌剂的研发环境。对于机器学习的实际应用而言,将分子信息转化为合适的描述符很重要,因为信息量过大的描述符需要大量的计算时间和用于收集数据的实验,而信息量不足的描述符则存在有效性问题。在本研究中,我们使用了一种仅关注取代基的描述符。具有4-喹诺酮结构的分子(11,879种化合物)上取代基的类型和位置被转化为组合取代基数(CSN)。虽然CSN不包含有关分子详细结构、物理性质和量子化学的信息,但通过对CSN进行机器学习构建的预测模型显示出足够的决定系数(训练数据集为0.719,验证数据集为0.519)。此外,这种CSN可以通过取代基的重组(32,079,318种化合物)和筛选轻松构建结构相对一致的未知分子库。使用模型推荐的分子和市售抗菌剂对大肠杆菌进行的生长抑制实验也证实了预测模型的有效性。