Ganesh Hari S, Beykal Burcu, Szafran Adam T, Stossi Fabio, Zhou Lan, Mancini Michael A, Pistikopoulos Efstratios N
Texas A&M Energy Institute, Texas A&M University, College Station, TX, United States of America.
Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, United States of America.
ESCAPE. 2021;50:481-486. doi: 10.1016/b978-0-323-88506-5.50076-0. Epub 2021 Jul 18.
A comprehensive evaluation of toxic chemicals and understanding their potential harm to human physiology is vital in mitigating their adverse effects following exposure from environmental emergencies. In this work, we develop data-driven classification models to facilitate rapid decision making in such catastrophic events and predict the estrogenic activity of environmental toxicants as estrogen receptor-α (ERα) agonists or antagonists. By combining high-content analysis, big-data analytics, and machine learning algorithms, we demonstrate that highly accurate classifiers can be constructed for evaluating the estrogenic potential of many chemicals. We follow a rigorous, high throughput microscopy-based high-content analysis pipeline to measure the single cell-level response of benchmark compounds with known effects on the ERα pathway. The resulting high-dimensional dataset is then pre-processed by fitting a non-central gamma probability distribution function to each feature, compound, and concentration. The characteristic parameters of the distribution, which represent the mean and the shape of the distribution, are used as features for the classification analysis Random Forest (RF) and Support Vector Machine (SVM) algorithms. The results show that the SVM classifier can predict the estrogenic potential of benchmark chemicals with higher accuracy than the RF algorithm, which misclassifies two antagonist compounds.
对有毒化学物质进行全面评估并了解它们对人体生理的潜在危害,对于减轻环境突发事件暴露后产生的不利影响至关重要。在这项工作中,我们开发了数据驱动的分类模型,以促进在这类灾难性事件中快速做出决策,并预测环境毒物作为雌激素受体-α(ERα)激动剂或拮抗剂的雌激素活性。通过结合高内涵分析、大数据分析和机器学习算法,我们证明可以构建高度准确的分类器来评估许多化学物质的雌激素潜力。我们遵循严格的基于高通量显微镜的高内涵分析流程,以测量对ERα途径有已知影响的基准化合物的单细胞水平反应。然后,通过将非中心伽马概率分布函数拟合到每个特征、化合物和浓度,对所得的高维数据集进行预处理。该分布的特征参数,即代表分布均值和形状的参数,被用作分类分析随机森林(RF)和支持向量机(SVM)算法的特征。结果表明,SVM分类器比RF算法能更准确地预测基准化学物质的雌激素潜力,RF算法将两种拮抗剂化合物误分类。