Wang Hao, Liu Ruifeng, Schyman Patric, Wallqvist Anders
The Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Bethesda, MD, United States.
Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Materiel Command, Frederick, MD, United States.
Front Pharmacol. 2019 Feb 5;10:42. doi: 10.3389/fphar.2019.00042. eCollection 2019.
Improving the accuracy of toxicity prediction models for liver injuries is a key element in evaluating the safety of drugs and chemicals. Mechanism-based information derived from expression (transcriptomic) data, in combination with machine-learning methods, promises to improve the accuracy and robustness of current toxicity prediction models. Deep neural networks (DNNs) have the advantage of automatically assembling the relevant features from a large number of input features. This makes them especially suitable for modeling transcriptomic data, which typically contain thousands of features. Here, we gaged gene- and pathway-level feature selection schemes using single- and multi-task DNN approaches in predicting chemically induced liver injuries (biliary hyperplasia, fibrosis, and necrosis) from whole-genome DNA microarray data. The single-task DNN models showed high predictive accuracy and endpoint specificity, with Matthews correlation coefficients for the three endpoints on 10-fold cross validation ranging from 0.56 to 0.89, with an average of 0.74 in the best feature sets. The DNN models outperformed Random Forest models in cross validation and showed better performance than Support Vector Machine models when tested in the external validation datasets. In the cross validation studies, the effect of the feature selection scheme was negligible among the studied feature sets. Further evaluation of the models on their ability to predict the injury phenotype for non-chemically induced injuries revealed the robust performance of the DNN models across these additional external testing datasets. Thus, the DNN models learned features specific to the injury phenotype contained in the gene expression data.
提高肝损伤毒性预测模型的准确性是评估药物和化学品安全性的关键因素。从表达(转录组)数据中获得的基于机制的信息,结合机器学习方法,有望提高当前毒性预测模型的准确性和稳健性。深度神经网络(DNN)具有从大量输入特征中自动组装相关特征的优势。这使得它们特别适合对通常包含数千个特征的转录组数据进行建模。在这里,我们使用单任务和多任务DNN方法,从全基因组DNA微阵列数据预测化学诱导的肝损伤(胆汁增生、纤维化和坏死),评估基因和通路水平的特征选择方案。单任务DNN模型显示出较高的预测准确性和终点特异性,在10倍交叉验证中,三个终点的马修斯相关系数范围为0.56至0.89,最佳特征集中平均为0.74。在交叉验证中,DNN模型优于随机森林模型,在外部验证数据集中进行测试时,其表现也优于支持向量机模型。在交叉验证研究中,在所研究的特征集中,特征选择方案的影响可以忽略不计。对模型预测非化学诱导损伤的损伤表型能力的进一步评估表明,DNN模型在这些额外的外部测试数据集中具有稳健的性能。因此,DNN模型学习了基因表达数据中包含的损伤表型特有的特征。