Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command , Fort Detrick , Maryland 21702 , United States.
J Chem Inf Model. 2019 Jan 28;59(1):181-189. doi: 10.1021/acs.jcim.8b00597. Epub 2018 Nov 19.
Domain applicability (DA) is a concept introduced to gauge the reliability of quantitative structure-activity relationship (QSAR) predictions. A leading DA metric is ensemble variance, which is defined as the variance of predictions by an ensemble of QSAR models. However, this metric fails to identify large prediction errors in melting point (MP) data, despite the availability of large training data sets. In this study, we examined the performance of this metric on MP data and found that, for most molecules, ensemble variance increased as their structural similarity to the training molecules decreased. However, the metric decreased for "out-of-domain" molecules, i.e., molecules with little to no structural similarity to the training compounds. This explains why ensemble variance fails to identify large prediction errors. In contrast, a new molecular similarity-based DA metric that considers the contributions of all training molecules in gauging the reliability of a prediction successfully identified predictions of MP data for which the errors were large. To validate our results, we used four additional data sets of diverse molecular properties. We divided each data set into a training set and a test set at a ratio of approximately 2:1, ensuring a small fraction of the test compounds are out of the training domain. We then trained random forest (RF) models on the training data and made RF predictions for the test set molecules. Results from these data sets confirm that the new DA metric significantly outperformed ensemble variance in identifying predictions for out-of-domain compounds. For within-domain compounds, the two metrics performed similarly, with ensemble variance marginally but consistently outperforming the new DA metric. The new DA metric, which does not rely on an ensemble of QSAR models, can be deployed with any machine-learning method, including deep neural networks.
域适用性(DA)是一个用于评估定量构效关系(QSAR)预测可靠性的概念。一个主要的 DA 度量指标是集合方差,它定义为一组 QSAR 模型的预测方差。然而,尽管有大量的训练数据集,这个指标却无法识别熔点(MP)数据中的大预测误差。在这项研究中,我们检查了这个指标在 MP 数据上的性能,发现对于大多数分子,当它们与训练分子的结构相似度降低时,集合方差会增加。然而,对于“非域”分子,即与训练化合物几乎没有结构相似性的分子,该指标会降低。这解释了为什么集合方差无法识别大的预测误差。相比之下,一种新的基于分子相似性的 DA 度量指标,它考虑了所有训练分子对预测可靠性的贡献,成功地识别出了 MP 数据中误差较大的预测。为了验证我们的结果,我们使用了另外四个具有不同分子性质的数据集。我们将每个数据集按照大约 2:1 的比例分为训练集和测试集,确保测试化合物中只有一小部分是不在训练域内的。然后,我们在训练数据上训练随机森林(RF)模型,并对测试集分子进行 RF 预测。这些数据集的结果证实,新的 DA 度量指标在识别非域化合物的预测方面明显优于集合方差。对于域内化合物,这两个指标的性能相似,集合方差略优于新的 DA 度量指标。新的 DA 度量指标不依赖于一组 QSAR 模型,可以与任何机器学习方法(包括深度神经网络)一起部署。