School of Public Health, Guangdong Pharmaceutical University, Guangzhou, China.
Guangdong Province Center for Disease Control and Prevention, Guangzhou, China.
Sci Rep. 2021 Apr 13;11(1):8030. doi: 10.1038/s41598-021-87035-y.
Mutagenicity exerts adverse effects on humans. Conventional methods cannot simultaneously predict the toxicity of a large number of compounds. Most mutagenicity prediction models are based on a single experimental type and lack other experimental combination data as support, resulting in limited application scope and predictive ability. In this study, we partitioned data from GENE-TOX, CPDB, and Chemical Carcinogenesis Research Information System according to the weight-of-evidence method for modelling. In our data set, in vivo and in vitro experiments in groups as well as prokaryotic and eukaryotic cell experiments were included in accordance with the ICH guideline. We compared the two experimental combinations mentioned in the weight-of-evidence method and reintegrated the experimental data into three groups. Nine sub-models and three fusion models were established using random forest (RF), support vector machine (SVM), and back propagation (BP) neural network algorithms. When fusing base models under the same algorithm according to the ensemble rules, all models showed excellent predictive performance. The RF, SVM, and BP fusion models reached a prediction accuracy rate of 83.4%, 80.5%, 79.0% respectively. The area under the curve (AUC) reached 0.853, 0.897, 0.865 respectively. Therefore, the established fusion QSAR models can serve as an early warning system for mutagenicity of compounds.
致突变性对人类有不良影响。传统方法无法同时预测大量化合物的毒性。大多数致突变性预测模型基于单一实验类型,缺乏其他实验组合数据的支持,因此应用范围和预测能力有限。在这项研究中,我们根据证据权重法对 GENE-TOX、CPDB 和 Chemical Carcinogenesis Research Information System 的数据进行了分区建模。在我们的数据集,体内和体外实验分组以及原核和真核细胞实验按照 ICH 指南进行。我们比较了证据权重法中提到的两种实验组合,并将实验数据重新整合为三组。使用随机森林(RF)、支持向量机(SVM)和反向传播(BP)神经网络算法建立了九个子模型和三个融合模型。当根据集成规则融合同一算法下的基础模型时,所有模型都表现出优异的预测性能。RF、SVM 和 BP 融合模型的预测准确率分别达到 83.4%、80.5%、79.0%。曲线下面积(AUC)分别达到 0.853、0.897、0.865。因此,建立的融合 QSAR 模型可以作为化合物致突变性的早期预警系统。