Idakwo Gabriel, Luttrell Joseph, Chen Minjun, Hong Huixiao, Zhou Zhaoxian, Gong Ping, Zhang Chaoyang
a School of Computing Sciences and Computer Engineering , University of Southern Mississippi , Hattiesburg , Mississippi , USA.
b Division of Bioinformatics and Biostatistics, National Center for Toxicological Science , US Food and Drug Administration , Jefferson , Arkansas , USA.
J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2018;36(4):169-191. doi: 10.1080/10590501.2018.1537118. Epub 2019 Jan 10.
In silico toxicity prediction plays an important role in the regulatory decision making and selection of leads in drug design as in vitro/vivo methods are often limited by ethics, time, budget, and other resources. Many computational methods have been employed in predicting the toxicity profile of chemicals. This review provides a detailed end-to-end overview of the application of machine learning algorithms to Structure-Activity Relationship (SAR)-based predictive toxicology. From raw data to model validation, the importance of data quality is stressed as it greatly affects the predictive power of derived models. Commonly overlooked challenges such as data imbalance, activity cliff, model evaluation, and definition of applicability domain are highlighted, and plausible solutions for alleviating these challenges are discussed.
由于体外/体内方法常常受到伦理、时间、预算和其他资源的限制,计算机模拟毒性预测在药物设计的监管决策和先导化合物选择中发挥着重要作用。许多计算方法已被用于预测化学品的毒性特征。本综述详细地从端到端的角度概述了机器学习算法在基于构效关系(SAR)的预测毒理学中的应用。从原始数据到模型验证,强调了数据质量的重要性,因为它极大地影响了所推导模型的预测能力。突出了诸如数据不平衡、活性悬崖、模型评估和适用域定义等常见被忽视的挑战,并讨论了缓解这些挑战的合理解决方案。