Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland 21702, USA.
Chem Res Toxicol. 2012 Oct 15;25(10):2216-26. doi: 10.1021/tx300279f. Epub 2012 Sep 26.
Toxicological experiments in animals are carried out to determine the type and severity of any potential toxic effect associated with a new lead compound. The collected data are then used to extrapolate the effects on humans and determine initial dose regimens for clinical trials. The underlying assumption is that the severity of the toxic effects in animals is correlated with that in humans. However, there is a general lack of toxic correlations across species. Thus, it is more advantageous to predict the toxicological effects of a compound on humans directly from the human toxicological data of related compounds. However, many popular quantitative structure-activity relationship (QSAR) methods that build a single global model by fitting all training data appear inappropriate for predicting toxicological effects of structurally diverse compounds because the observed toxicological effects may originate from very different and mostly unknown molecular mechanisms. In this article, we demonstrate, via application to the human maximum recommended daily dose data that locally weighted learning methods, such as k-nearest neighbors, are well suited for predicting toxicological effects of structurally diverse compounds. We also show that a significant flaw of the k-nearest neighbor method is that it always uses a constant number of nearest neighbors in making prediction for a target compound, irrespective of whether the nearest neighbors are structurally similar enough to the target compound to ensure that they share the same mechanism of action. To remedy this flaw, we proposed and implemented a variable number nearest neighbor method. The advantages of the variable number nearest neighbor method over other QSAR methods include (1) allowing more reliable predictions to be achieved by applying a tighter molecular distance threshold and (2) automatic detection for when a prediction should not be made because the compound is outside the applicable domain.
在动物身上进行毒理学实验,是为了确定与新先导化合物相关的任何潜在毒性作用的类型和严重程度。然后,将收集到的数据外推到人类身上,确定临床试验的初始剂量方案。其基本假设是,动物的毒性作用严重程度与人类的毒性作用严重程度相关。然而,不同物种之间的毒性相关性普遍缺乏。因此,直接从相关化合物的人类毒理学数据预测化合物对人类的毒理学效应更为有利。然而,许多流行的定量构效关系(QSAR)方法通过拟合所有训练数据来构建单一的全局模型,对于预测结构多样的化合物的毒理学效应似乎并不合适,因为观察到的毒理学效应可能来自非常不同且大多未知的分子机制。在本文中,我们通过对人类最大推荐日剂量数据的应用证明,局部加权学习方法(如 k-最近邻法)非常适合预测结构多样的化合物的毒理学效应。我们还表明,k-最近邻法的一个显著缺陷是,它在对目标化合物进行预测时总是使用固定数量的最近邻,而不管这些最近邻是否与目标化合物具有足够的结构相似性,以确保它们具有相同的作用机制。为了弥补这一缺陷,我们提出并实现了一种可变数量的最近邻方法。与其他 QSAR 方法相比,可变数量最近邻方法的优点包括:(1)通过应用更严格的分子距离阈值,实现更可靠的预测;(2)自动检测何时不应进行预测,因为化合物超出了适用范围。