Collaborations Pharmaceuticals Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Global Product Safety, SC Johnson and Son, Inc., Racine, Wisconsin 53404, United States.
Environ Sci Technol. 2020 Nov 3;54(21):13690-13700. doi: 10.1021/acs.est.0c03984. Epub 2020 Oct 21.
The androgen receptor (AR) is a target of interest for endocrine disruption research, as altered signaling can affect normal reproductive and neurological development for generations. In an effort to prioritize compounds with alternative methodologies, the U.S. Environmental Protection Agency (EPA) used data from 11 assays to construct models of AR agonist and antagonist signaling pathways. While these EPA ToxCast AR models require data to assign a bioactivity score, Bayesian machine learning methods can be used for prospective prediction from molecule structure alone. This approach was applied to multiple types of data corresponding to the EPA's AR signaling pathway with proprietary software, Assay Central. The training performance of all machine learning models, including six other algorithms, was evaluated by internal 5-fold cross-validation statistics. Bayesian machine learning models were also evaluated with external predictions of reference chemicals to compare prediction accuracies to published results from the EPA. The machine learning model group selected for further studies of endocrine disruption consisted of continuous AC data from the February 2019 release of ToxCast/Tox21. These efforts demonstrate how machine learning can be used to predict AR-mediated bioactivity and can also be applied to other targets of endocrine disruption.
雄激素受体(AR)是内分泌干扰研究的一个目标,因为改变的信号可以影响几代人的正常生殖和神经发育。为了优先考虑具有替代方法的化合物,美国环境保护署(EPA)使用来自 11 种测定方法的数据构建了 AR 激动剂和拮抗剂信号通路的模型。虽然这些 EPA ToxCast AR 模型需要数据来分配生物活性评分,但贝叶斯机器学习方法可以仅从分子结构进行前瞻性预测。该方法应用于与 EPA 的 AR 信号通路相对应的多种类型的数据,使用专有软件 Assay Central。所有机器学习模型的训练性能,包括其他六种算法,都通过内部 5 倍交叉验证统计数据进行了评估。还使用 EPA 发表的参考化学品的外部预测来评估贝叶斯机器学习模型,以比较预测准确性与 EPA 的公布结果。选择进一步研究内分泌干扰的机器学习模型组包括来自 ToxCast/Tox21 2019 年 2 月发布的连续 AC 数据。这些努力展示了如何使用机器学习来预测 AR 介导的生物活性,并且还可以应用于其他内分泌干扰靶标。