用于复杂毒性途径建模和类固醇生成预测的机器学习与大语言模型

Machine Learning and Large Language Models for Modeling Complex Toxicity Pathways and Predicting Steroidogenesis.

作者信息

Lane Thomas R, Vignaux Patricia A, Harris Joshua S, Snyder Scott H, Urbina Fabio, Ekins Sean

机构信息

Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States of America.

出版信息

Environ Sci Technol. 2025 Jul 15;59(27):13844-13856. doi: 10.1021/acs.est.5c04054. Epub 2025 Jun 27.

DOI:10.1021/acs.est.5c04054

PMID:40576990

Abstract

High-throughput screening and computational models have been effective in predicting chemical interactions with estrogen and androgen receptors, but similar approaches for steroidogenesis remain limited. To address this gap, we developed general steroidogenesis modulation models using data from ∼1,800 chemicals screened in H295R human adrenocortical carcinoma cells. A random forest model was validated using a prospective test set of 20 compounds (14 predicted active, 6 inactive), achieving 80% accuracy with conformal prediction adjustments. In parallel, we built classification and regression models based on IC data from ChEMBL for key steroidogenic enzymes, including CYP17A1, CYP21A2, CYP11B1, CYP11B2, 17β-HSD (1/2/3/5), 5α-reductase (1/2), and CYP19A1 (126-9,327 compounds per target). These models enable predictions of both general steroidogenesis inhibition and potential molecular targets. Additionally, we developed a transformer-based model (MolBART) to predict all end points simultaneously and validated this performance. Combined, these models may offer a rapid and scalable system for assessing chemical impacts on steroidogenesis, supporting chemical risk assessment, product stewardship, and regulatory decision-making.

摘要

高通量筛选和计算模型在预测与雌激素和雄激素受体的化学相互作用方面已经取得了成效，但用于类固醇生成的类似方法仍然有限。为了填补这一空白，我们利用在H295R人肾上腺皮质癌细胞中筛选的约1800种化学物质的数据，开发了通用的类固醇生成调节模型。使用20种化合物的前瞻性测试集（14种预测为活性，6种无活性）对随机森林模型进行了验证，经共形预测调整后准确率达到80%。同时，我们基于来自ChEMBL的关键类固醇生成酶的IC数据构建了分类和回归模型，这些酶包括CYP17A1、CYP21A2、CYP11B1、CYP11B2、17β - HSD（1/2/3/5）、5α - 还原酶（1/2）和CYP19A1（每个靶点有126 - 9327种化合物）。这些模型能够预测一般的类固醇生成抑制作用以及潜在的分子靶点。此外，我们开发了一种基于Transformer的模型（MolBART）来同时预测所有终点，并验证了其性能。综合起来，这些模型可能提供一个快速且可扩展的系统，用于评估化学物质对类固醇生成的影响，支持化学风险评估、产品监管以及监管决策。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于复杂毒性途径建模和类固醇生成预测的机器学习与大语言模型

Machine Learning and Large Language Models for Modeling Complex Toxicity Pathways and Predicting Steroidogenesis.

作者信息

机构信息

出版信息

相似文献

用于复杂毒性途径建模和类固醇生成预测的机器学习与大语言模型

Machine Learning and Large Language Models for Modeling Complex Toxicity Pathways and Predicting Steroidogenesis.

作者信息

机构信息

出版信息

相似文献