Rosa Lucca Caiaffa Santos, Sarhan Mariam, Pimentel Andre Silva
Departamento de Química, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, RJ 22453-900, Brazil.
Environ Health (Wash). 2025 Jan 27;3(3):321-333. doi: 10.1021/envhealth.4c00218. eCollection 2025 Mar 21.
The local interpretable model-agnostic explanation method was used to unveil substructures (toxic alerts) that cause endocrine disruption in chemical compounds using machine learning models. The random forest classifier was applied to build explainable models with the TOX21 data sets after data curation. Using these models applied to the EDC and EDKB-FDA data sets, the substructures that cause endocrine disruption in chemical compounds were unveiled, providing stable, more specific, and consistent explanations, which are essential for trust and acceptance of the findings, mainly due to the difficulty of finding relevant experimental evidence for different receptors (androgen, estrogen, aryl hydrocarbon, aromatase, and peroxisome proliferator-activated receptors). This approach is significant because of its contribution to the interpretability of explainable machine learning algorithms, particularly in the context of unveiling substructures associated with endocrine disruption in five targets (androgen receptor, estrogen receptor, aryl hydrocarbon receptors, aromatase receptors, and peroxisome proliferator-activated receptors), thereby advancing the relevant field of environmental toxicology, where a careful evaluation of the potential risks of exposure to new compounds is needed. The specific substructures thiophosphate, sulfamate, anilide, carbamate, sulfamide, and thiocyanate are presented as toxic alerts that cause endocrine disruption to better understand their potential risks and adverse effects on human health and the environment.
使用局部可解释模型无关解释方法,借助机器学习模型揭示导致化合物内分泌干扰的子结构(毒性警报)。在数据整理后,应用随机森林分类器利用TOX21数据集构建可解释模型。将这些模型应用于EDC和EDKB - FDA数据集,揭示了导致化合物内分泌干扰的子结构,提供了稳定、更具体且一致的解释,这对于研究结果的信任和接受至关重要,主要是因为难以找到针对不同受体(雄激素、雌激素、芳烃、芳香化酶和过氧化物酶体增殖物激活受体)的相关实验证据。这种方法意义重大,因为它有助于可解释机器学习算法的可解释性,特别是在揭示与五个靶点(雄激素受体、雌激素受体、芳烃受体、芳香化酶受体和过氧化物酶体增殖物激活受体)内分泌干扰相关的子结构方面,从而推动了环境毒理学相关领域的发展,在该领域需要仔细评估接触新化合物的潜在风险。具体的子结构硫代磷酸盐、氨基磺酸盐、酰苯胺、氨基甲酸酯、磺酰胺和硫氰酸盐被列为导致内分泌干扰的毒性警报,以便更好地了解它们对人类健康和环境的潜在风险及不利影响。