Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China.
Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China.
Ecotoxicol Environ Saf. 2014 Dec;110:280-7. doi: 10.1016/j.ecoenv.2014.08.026. Epub 2014 Oct 3.
Rapidly and correctly identifying endocrine-disrupting chemicals (EDCs) is an important issue in environmental risk assessment. Major EDCs are associated with the androgen receptor (AR) and oestrogen receptors (ERs). Because of the high cost and time-consuming nature of experimental tests, in silico methods are valuable alternative tools for the identification of EDCs. In this study, a large dataset related to EDCs was constructed. Each molecule was represented with seven fingerprints, and computational models were subsequently developed to predict AR and ER binders via machine learning methods including k-nearest neighbour (kNN), C4.5 decision tree (C4.5 DT), naïve Bayes (NB), and support vector machine (SVM) algorithms. The best model for predicting AR binders was PubChem Fingerprint-SVM, which exhibited an accuracy of 0.84. For ER binders, the best method was Extended Fingerprint-SVM with an accuracy of 0.79. Moreover, several representative substructure alerts for characterizing EDCs, such as phenol, trifluoromethyl, and annelated rings, were identified using the combination of information gain and substructure frequency analysis. Our study involved a systematic computational assessment of EDCs related to AR and ERs, and provides significant information on the structural characteristics of these chemicals, which are a great help in identifying EDCs.
快速准确地识别内分泌干扰化学物质(EDCs)是环境风险评估中的一个重要问题。主要的 EDCs 与雄激素受体(AR)和雌激素受体(ER)有关。由于实验测试的成本高、耗时,计算方法是识别 EDCs 的有价值的替代工具。在这项研究中,构建了一个与 EDCs 相关的大型数据集。每个分子都用七种指纹表示,随后通过机器学习方法(包括 k-最近邻(kNN)、C4.5 决策树(C4.5 DT)、朴素贝叶斯(NB)和支持向量机(SVM)算法)开发了计算模型,以预测 AR 和 ER 结合物。预测 AR 结合物的最佳模型是 PubChem 指纹-SVM,其准确率为 0.84。对于 ER 结合物,最好的方法是扩展指纹-SVM,准确率为 0.79。此外,还使用信息增益和子结构频率分析的组合,确定了几个用于表征 EDCs 的代表性亚结构警报,例如酚、三氟甲基和稠环。我们的研究涉及对与 AR 和 ER 相关的 EDCs 的系统计算评估,并提供了有关这些化学物质结构特征的重要信息,这对识别 EDCs 有很大帮助。