Goel Mansi, Amawate Arav, Singh Angadjeet, Bagler Ganesh
Infosys Centre for Artificial Intelligence, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India; Department of Computational Biology, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India; Center of Excellence in Healthcare, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India.
Department of Computer Science, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi, 110020, India.
Chemosphere. 2025 Feb;370:143900. doi: 10.1016/j.chemosphere.2024.143900. Epub 2024 Dec 24.
Predicting the toxicity of molecules is essential in fields like drug discovery, environmental protection, and industrial chemical management. While traditional experimental methods are time-consuming and costly, computational models offer an efficient alternative. In this study, we introduce ToxinPredictor, a machine learning-based model to predict the toxicity of small molecules using their structural properties. The model was trained on a curated dataset of 7550 toxic and 6514 non-toxic molecules, leveraging feature selection techniques like Boruta and PCA. The best-performing model, a Support Vector Machine (SVM), achieved state-of-the-art results with an AUROC of 91.7%, F1-score of 84.9%, and accuracy of 85.4%, outperforming existing solutions. SHAP analysis was applied to the SVM model to identify the most important molecular descriptors contributing to toxicity predictions, enhancing interpretability. Despite challenges related to data quality, ToxinPredictor provides a reliable framework for toxicity risk assessment, paving the way for safer drug development and improved environmental health assessments. We also created a user-friendly webserver, ToxinPredictor (https://cosylab.iiitd.edu.in/toxinpredictor) to facilitate the search and prediction of toxic compounds.
预测分子的毒性在药物发现、环境保护和工业化学品管理等领域至关重要。虽然传统的实验方法既耗时又昂贵,但计算模型提供了一种高效的替代方案。在本研究中,我们引入了ToxinPredictor,这是一种基于机器学习的模型,用于利用小分子的结构特性预测其毒性。该模型在一个由7550个有毒分子和6514个无毒分子组成的精选数据集上进行训练,利用了诸如Boruta和主成分分析(PCA)等特征选择技术。表现最佳的模型是支持向量机(SVM),其曲线下面积(AUROC)为91.7%,F1分数为84.9%,准确率为85.4%,达到了当前的最佳结果,优于现有解决方案。我们将SHAP分析应用于SVM模型,以识别对毒性预测贡献最大的分子描述符,从而增强可解释性。尽管存在与数据质量相关的挑战,但ToxinPredictor为毒性风险评估提供了一个可靠的框架,为更安全的药物开发和改进的环境卫生评估铺平了道路。我们还创建了一个用户友好的网络服务器ToxinPredictor(https://cosylab.iiitd.edu.in/toxinpredictor),以方便对有毒化合物进行搜索和预测。