Ollitrault Guillaume, Marzo Marco, Roncaglioni Alessandra, Benfenati Emilio, Mombelli Enrico, Taboureau Olivier
Inserm U1133, CNRS UMR 8251, Université Paris Cité, 75013 Paris, France.
Department of Environmental Health Sciences, Laboratory of Environmental Chemistry and Toxicology, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, 20156 Milano, Italy.
Toxics. 2024 Jul 26;12(8):541. doi: 10.3390/toxics12080541.
Endocrine-disrupting chemicals (EDCs) are chemicals that can interfere with homeostatic processes. They are a major concern for public health, and they can cause adverse long-term effects such as cancer, intellectual impairment, obesity, diabetes, and male infertility. The endocrine system is a complex machinery, with the estrogen (E), androgen (A), and thyroid hormone (T) modes of action being of major importance. In this context, the availability of in silico models for the rapid detection of hazardous chemicals is an effective contribution to toxicological assessments. We developed Qualitative Gene expression Activity Relationship (QGexAR) models to predict the propensities of chemically induced disruption of EAT modalities. We gathered gene expression profiles from the LINCS database tested on two cell lines, i.e., MCF7 (breast cancer) and A549 (adenocarcinomic human alveolar basal epithelial). We optimized our prediction protocol by testing different feature selection methods and classification algorithms, including CATBoost, XGBoost, Random Forest, SVM, Logistic regression, AutoKeras, TPOT, and deep learning models. For each EAT endpoint, the final prediction was made according to a consensus prediction as a function of the best model obtained for each cell line. With the available data, we were able to develop a predictive model for estrogen receptor and androgen receptor binding and thyroid hormone receptor antagonistic effects with a consensus balanced accuracy on a validation set ranging from 0.725 to 0.840. The importance of each predictive feature was further assessed to identify known genes and suggest new genes potentially involved in the mechanisms of action of EAT perturbation.
内分泌干扰化学物质(EDCs)是能够干扰体内平衡过程的化学物质。它们是公共卫生的主要关注点,可能会导致癌症、智力障碍、肥胖、糖尿病和男性不育等长期不良影响。内分泌系统是一个复杂的机制,其中雌激素(E)、雄激素(A)和甲状腺激素(T)的作用模式至关重要。在此背景下,用于快速检测有害化学物质的计算机模拟模型的可用性对毒理学评估做出了有效贡献。我们开发了定性基因表达活性关系(QGexAR)模型来预测化学诱导的EAT模式破坏的倾向。我们从LINCS数据库收集了在两种细胞系(即MCF7(乳腺癌)和A549(人肺泡基底上皮腺癌))上测试的基因表达谱。我们通过测试不同的特征选择方法和分类算法来优化预测协议,包括CATBoost、XGBoost、随机森林、支持向量机、逻辑回归、AutoKeras、TPOT和深度学习模型。对于每个EAT终点,根据每个细胞系获得的最佳模型的函数进行共识预测来做出最终预测。利用现有数据,我们能够开发出一种预测模型,用于预测雌激素受体和雄激素受体结合以及甲状腺激素受体拮抗作用,在验证集上的共识平衡准确率范围为0.725至0.840。进一步评估了每个预测特征的重要性,以识别已知基因并提出可能参与EAT扰动作用机制的新基因。