Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, 3900 NCTR Road, Jefferson, Arkansas 72079, United States.
Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health, 9800 Medical Center Drive, Rockville, Maryland 20850, United States.
Chem Res Toxicol. 2021 Feb 15;34(2):541-549. doi: 10.1021/acs.chemrestox.0c00373. Epub 2021 Jan 29.
Selecting a model in predictive toxicology often involves a trade-off between prediction performance and explainability: should we sacrifice the model performance to gain explainability or vice versa. Here we present a comprehensive study to assess algorithm and feature influences on model performance in chemical toxicity research. We conducted over 5000 models for a Tox21 bioassay data set of 65 assays and ∼7600 compounds. Seven molecular representations as features and 12 modeling approaches varying in complexity and explainability were employed to systematically investigate the impact of various factors on model performance and explainability. We demonstrated that end points dictated a model's performance, regardless of the chosen modeling approach including deep learning and chemical features. Overall, more complex models such as (LS-)SVM and Random Forest performed marginally better than simpler models such as linear regression and KNN in the presented Tox21 data analysis. Since a simpler model with acceptable performance often also is easy to interpret for the Tox21 data set, it clearly was the preferred choice due to its better explainability. Given that each data set had its own error structure both for dependent and independent variables, we strongly recommend that it is important to conduct a systematic study with a broad range of model complexity and feature explainability to identify model balancing its predictivity and explainability.
我们是否应该牺牲模型性能来获得可解释性,或者反之亦然。在这里,我们进行了一项全面的研究,以评估算法和特征对化学毒性研究中模型性能的影响。我们针对 Tox21 生物测定数据集(包含 65 个测定和约 7600 种化合物)进行了超过 5000 个模型的构建。我们使用了七种分子表示作为特征,并采用了 12 种在复杂性和可解释性方面有所不同的建模方法,以系统地研究各种因素对模型性能和可解释性的影响。我们证明,终点决定了模型的性能,而与所选择的建模方法(包括深度学习和化学特征)无关。总体而言,在呈现的 Tox21 数据分析中,诸如(最小二乘)支持向量机和随机森林等更复杂的模型比线性回归和 KNN 等更简单的模型的性能略有提高。由于对于 Tox21 数据集而言,具有可接受性能的简单模型通常也更容易解释,因此它显然是首选,因为它具有更好的可解释性。鉴于每个数据集都有其自己的误差结构,无论是对于因变量还是自变量,我们强烈建议进行具有广泛模型复杂性和特征可解释性的系统研究,以确定平衡预测能力和可解释性的模型。