Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Environment, Zhejiang University of Technology, Hangzhou 310032, China.
College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.
Sci Total Environ. 2017 Feb 15;580:1268-1275. doi: 10.1016/j.scitotenv.2016.12.088. Epub 2016 Dec 20.
Endocrine-disrupting chemicals (EDCs), which can threaten ecological safety and be harmful to human beings, have been cause for wide concern. There is a high demand for efficient methodologies for evaluating potential EDCs in the environment. Herein an evaluation platform was developed using novel and statistically robust ternary models via different machine learning models (i.e., linear discriminant analysis, classification and regression tree, and support vector machines). The platform is aimed at effectively classifying chemicals with agonistic, antagonistic, or no estrogen receptor (ER) activities. A total of 440 chemicals from the literature were selected to derive and optimize the three-class model. One hundred and nine new chemicals appeared on the 2014 EPA list for EDC screening, which were used to assess the predictive performances by comparing the E-screen results with the predicted results of the classification models. The best model was obtained using support vector machines (SVM) which recognized agonists and antagonists with accuracies of 76.6% and 75.0%, respectively, on the test set (with an overall predictive accuracy of 75.2%), and achieved a 10-fold cross-validation (CV) of 73.4%. The external predicted accuracy validated by the E-screen assay was 87.5%, which demonstrated the application value for a virtual alert for EDCs with ER agonistic or antagonistic activities. It was demonstrated that the ternary computational model could be used as a faster and less expensive method to identify EDCs that act through nuclear receptors, and to classify these chemicals into different mechanism groups.
内分泌干扰化学物质(EDCs)会威胁生态安全,对人类健康有害,因此受到广泛关注。人们迫切需要有效的方法来评估环境中的潜在 EDCs。本文通过不同的机器学习模型(即线性判别分析、分类回归树和支持向量机),建立了一个基于新型统计学稳健三元模型的评估平台。该平台旨在有效区分具有雌激素受体(ER)激动、拮抗或无活性的化学物质。从文献中选择了 440 种化学物质来推导和优化三分类模型。109 种新的化学物质出现在 2014 年 EPA 的 EDC 筛选清单上,通过比较 E-screen 结果和分类模型的预测结果,用这些物质来评估预测性能。使用支持向量机(SVM)获得了最佳模型,该模型对测试集中的激动剂和拮抗剂的识别准确率分别为 76.6%和 75.0%(总体预测准确率为 75.2%),10 倍交叉验证(CV)的准确率为 73.4%。通过 E-screen 测定验证的外部预测准确率为 87.5%,这表明该模型在具有 ER 激动或拮抗活性的虚拟 EDC 警报方面具有应用价值。结果表明,三元计算模型可以作为一种更快、更经济的方法来识别通过核受体起作用的 EDCs,并将这些化学物质分类到不同的作用机制组中。