College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd, Hangzhou, Zhejiang, 310058, China; Polytechnic Institute, Zhejiang University, 269 Shixiang Rd, Hangzhou, Zhejiang, 310015, China.
College of Pharmaceutical Sciences, Zhejiang University, 866 Yuhangtang Rd, Hangzhou, Zhejiang, 310058, China.
Comput Biol Med. 2022 Jul;146:105573. doi: 10.1016/j.compbiomed.2022.105573. Epub 2022 Apr 30.
Chromosome aberration (CA) is a serious genotoxicity of a compound, leading to carcinogenicity and developmental side effects. In the present manuscript, we developed a QSAR model for CA prediction using artificial intelligence methodologies. The reliable QSAR model was constructed based on an enlarged data set of 3208 compounds by optimizing machine learning and deep learning algorithms based on hyperparametric iterations and using multiple descriptors of molecular fingerprint in combination with drug-like molecular properties (MP) screened by entropy weight methodology on the open-source Python platform. Furthermore, molecular similarity for returning search and molecular connection index for additional descriptor were additionally introduced to differentiate the compounds with high similarity for correct CA prediction for QSAR model generation. The final generated CA-(Q)SAR model exhibited good prediction accuracy of 80.6%. The bias of the final model is about 0.9793. On the basis of generated QSAR model, data analyses were further performed to analyze the typical structure features in numerical intervals (MPI) of molecular properties MW, XlogP, and TPSA, respectively, for potential CA or non-CA toxicity with a normalized occurrence probability (NOP) more than 70%, which may provide useful clues for drug design of leads or candidate devoid of CA genotoxicity.
染色体畸变 (CA) 是化合物的一种严重遗传毒性,可导致致癌性和发育性副作用。在本手稿中,我们使用人工智能方法开发了用于 CA 预测的定量构效关系 (QSAR) 模型。该可靠的 QSAR 模型是基于一个经过优化的机器学习和深度学习算法的 3208 种化合物的扩大数据集构建的,这些算法基于超参数迭代,并使用分子指纹的多种描述符与基于熵权重方法筛选的药物样分子特性 (MP) 相结合,在开源 Python 平台上进行。此外,还引入了分子相似性用于返回搜索和分子连接指数用于附加描述符,以区分具有高相似度的化合物,从而为 QSAR 模型生成进行正确的 CA 预测。最终生成的 CA-(Q)SAR 模型表现出良好的预测准确性,达到 80.6%。最终模型的偏差约为 0.9793。在此生成的 QSAR 模型基础上,进一步进行数据分析,以分析分子性质 MW、XlogP 和 TPSA 的典型结构特征在数值间隔 (MPI) 中的数值,分别用于潜在的 CA 或非 CA 毒性,归一化出现概率 (NOP) 超过 70%,这可能为无 CA 遗传毒性的先导物或候选药物设计提供有用的线索。