Zhang Senpeng, Zhao Dongyu, Cui Qinghua
Department of Biomedical Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing 100191, People's Republic of China.
ACS Omega. 2024 Apr 12;9(16):17839-17847. doi: 10.1021/acsomega.3c07682. eCollection 2024 Apr 23.
Molecular toxicity is a critical feature of drug development. It is thus very important to develop computational models to evaluate the toxicity of small molecules. The accuracy of toxicity prediction largely depends on the quality of molecular representation; however, current methods for this purpose do not address this issue well. Here, we introduce a new metric, gap-Δenergy, which is designed to quantify the intermolecular bond energy difference with atom distance. We next find significant variations in the gap-Δenergy distribution among different types of molecules. Moreover, we show that this metric is able to distinguish the toxic small molecules. We collected data sets of toxic and exogenous small molecules and presented a novel index, namely, global toxicity, to evaluate the overall toxicity of molecules. Based on molecular descriptors and the proposed gap-Δenergy metric, we further constructed machine learning models that were trained with 7816 small molecules. The XGBoost-based model achieved the best performance with an AUC score of 0.965 and an F1 score of 0.849 on the test set (1954 small molecules), which outperformed the model that did not use gap-Δenergy features, with a sensitivity score increase of 3.2%.
分子毒性是药物研发的一个关键特征。因此,开发计算模型来评估小分子的毒性非常重要。毒性预测的准确性在很大程度上取决于分子表示的质量;然而,目前用于此目的的方法并不能很好地解决这个问题。在这里,我们引入了一种新的指标,即间隙-Δ能量,它旨在量化分子间键能随原子距离的差异。接下来,我们发现不同类型分子之间的间隙-Δ能量分布存在显著差异。此外,我们表明该指标能够区分有毒的小分子。我们收集了有毒和外源性小分子的数据集,并提出了一个新的指标,即全局毒性,以评估分子的整体毒性。基于分子描述符和提出的间隙-Δ能量指标,我们进一步构建了机器学习模型,该模型使用7816个小分子进行训练。基于XGBoost的模型在测试集(1954个小分子)上取得了最佳性能,AUC分数为0.965,F1分数为0.849,优于未使用间隙-Δ能量特征的模型,灵敏度分数提高了3.2%。