Molecular Graphics and Computation Facility, College of Chemistry, University of California, Berkeley, CA 94720, USA.
Divisions of Environmental Health Sciences and Biostatistics, School of Public Health, University of California Berkeley, CA 94720, USA.
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac114.
Nuclear receptors (NRs) are important biological targets of endocrine-disrupting chemicals (EDCs). Identifying chemicals that can act as EDCs and modulate the function of NRs is difficult because of the time and cost of in vitro and in vivo screening to determine the potential hazards of the 100 000s of chemicals that humans are exposed to. Hence, there is a need for computational approaches to prioritize chemicals for biological testing. Machine learning (ML) techniques are alternative methods that can quickly screen millions of chemicals and identify those that may be an EDC. Computational models of chemical binding to multiple NRs have begun to emerge. Recently, a Nuclear Receptor Activity (NuRA) dataset, describing experimentally derived small-molecule activity against various NRs has been created. We have used the NuRA dataset to develop an ensemble of ML-based models to predict the agonism, antagonism, binding and effector binding of small molecules to nine different human NRs. We defined the applicability domain of the ML models as a measure of Tanimoto similarity to the molecules in the training set, which enhanced the performance of the developed classifiers. We further developed a user-friendly web server named 'NR-ToxPred' to predict the binding of chemicals to the nine NRs using the best-performing models for each receptor. This web server is freely accessible at http://nr-toxpred.cchem.berkeley.edu. Users can upload individual chemicals using Simplified Molecular-Input Line-Entry System, CAS numbers or sketch the molecule in the provided space to predict the compound's activity against the different NRs and predict the binding mode for each.
核受体 (NRs) 是内分泌干扰化学物质 (EDCs) 的重要生物靶标。由于体外和体内筛选确定人类接触的 10 万多种化学物质的潜在危害所需的时间和成本,因此很难确定哪些化学物质可以作为 EDC 并调节 NR 的功能。因此,需要计算方法来优先考虑用于生物学测试的化学物质。机器学习 (ML) 技术是一种替代方法,可以快速筛选数百万种化学物质,并识别可能是 EDC 的化学物质。描述化学物质与多种 NR 结合的计算模型已经开始出现。最近,创建了一个核受体活性 (NuRA) 数据集,该数据集描述了针对各种 NR 的实验衍生小分子活性。我们使用 NuRA 数据集开发了一组基于 ML 的模型,以预测小分子对九种不同人类 NR 的激动剂、拮抗剂、结合和效应物结合活性。我们将 ML 模型的适用域定义为与训练集中分子的 Tanimoto 相似性的度量,这提高了开发的分类器的性能。我们进一步开发了一个名为“NR-ToxPred”的用户友好型网络服务器,该服务器使用针对每个受体的性能最佳模型来预测化学物质与九种 NR 的结合。该网络服务器可在 http://nr-toxpred.cchem.berkeley.edu 免费访问。用户可以使用简化分子输入行输入系统、CAS 编号或在提供的空间中绘制分子来上传单个化学物质,以预测化合物对不同 NR 的活性,并预测每种化合物的结合模式。