Du Hanwen, Cai Yingchun, Yang Hongbin, Zhang Hongxiao, Xue Yuhan, Liu Guixia, Tang Yun, Li Weihua
Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology , Shanghai 200237, China.
Chem Res Toxicol. 2017 May 15;30(5):1209-1218. doi: 10.1021/acs.chemrestox.7b00037. Epub 2017 Apr 26.
Environmental chemicals may affect endocrine systems through multiple mechanisms, one of which is via effects on aromatase (also known as CYP19A1), an enzyme critical for maintaining the normal balance of estrogens and androgens in the body. Therefore, rapid and efficient identification of aromatase-related endocrine disrupting chemicals (EDCs) is important for toxicology and environment risk assessment. In this study, on the basis of the Tox21 10K compound library, in silico classification models for predicting aromatase binders/nonbinders were constructed by machine learning methods. To improve the prediction ability of the models, a combined classifier (CC) strategy that combines different independent machine learning methods was adopted. Performances of the models were measured by test and external validation sets containing 1336 and 216 chemicals, respectively. The best model was obtained with the MACCS (Molecular Access System) fingerprint and CC method, which exhibited an accuracy of 0.84 for the test set and 0.91 for the external validation set. Additionally, several representative substructures for characterizing aromatase binders, such as ketone, lactone, and nitrogen-containing derivatives, were identified using information gain and substructure frequency analysis. Our study provided a systematic assessment of chemicals binding to aromatase. The built models can be helpful to rapidly identify potential EDCs targeting aromatase.
环境化学物质可能通过多种机制影响内分泌系统,其中之一是通过对芳香化酶(也称为CYP19A1)产生影响,该酶对于维持体内雌激素和雄激素的正常平衡至关重要。因此,快速有效地识别与芳香化酶相关的内分泌干扰化学物质(EDCs)对于毒理学和环境风险评估具有重要意义。在本研究中,基于Tox21 10K化合物库,通过机器学习方法构建了用于预测芳香化酶结合剂/非结合剂的计算机分类模型。为了提高模型的预测能力,采用了一种结合不同独立机器学习方法的组合分类器(CC)策略。模型的性能通过分别包含1336种和216种化学物质的测试集和外部验证集进行评估。使用MACCS(分子访问系统)指纹和CC方法获得了最佳模型,该模型在测试集上的准确率为0.84,在外部验证集上的准确率为0.91。此外,利用信息增益和子结构频率分析,确定了几种用于表征芳香化酶结合剂的代表性子结构,如酮、内酯和含氮衍生物。我们的研究提供了对与芳香化酶结合的化学物质的系统评估。所构建的模型有助于快速识别靶向芳香化酶的潜在EDCs。