Gajewicz-Skretna Agnieszka, Wyrzykowska Ewelina, Gromelski Maciej
Laboratory of Environmental Chemoinformatics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland.
Laboratory of Environmental Chemoinformatics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland.
Sci Total Environ. 2023 Feb 25;861:160590. doi: 10.1016/j.scitotenv.2022.160590. Epub 2022 Dec 5.
The toxicological profile of any chemical is defined by multiple endpoints and testing procedures, including representative test species from different trophic levels. While computer-aided methods play an increasingly important role in supporting ecotoxicology research and chemical hazard assessment, most of the recently developed machine learning models are directed towards a single, specific endpoint. To overcome this limitation and accelerate the process of identifying potentially hazardous environmental pollutants, we are introducing an effective approach for quantitative, multi-species modeling. The proposed approach is based on canonical correlation analysis that finds a pair(s) of uncorrelated, linear combinations of the original variables that best defines the overall variability within and between multiple biological responses and predictor variables. Its effectiveness was confirmed by the machine learning model for estimating acute toxicity of diverse organic pollutants in aquatic species from three trophic levels: algae (Pseudokirchneriella subcapitata), daphnia (Daphnia magna), and fish (Oryzias latipes). The multi-species model achieved a favorable predictive performance that were in line with predictive models derived for the aquatic organisms individually. The chemical bioavailability and reactivity parameters (n-octanol/water partition coefficient, chemical potential, and molecular size and volume) were important to accurately predict acute ecotoxicity to the three aquatic organisms. To facilitate the use of this approach, an open-source, Python-based script, named qMTM (quantitative Multi-species Toxicity Modeling) has been provided.
任何化学物质的毒理学特征都是由多个终点指标和测试程序定义的,包括来自不同营养级别的代表性测试物种。虽然计算机辅助方法在支持生态毒理学研究和化学物质危害评估方面发挥着越来越重要的作用,但最近开发的大多数机器学习模型都针对单一的特定终点指标。为了克服这一局限性并加快识别潜在有害环境污染物的过程,我们引入了一种有效的定量多物种建模方法。所提出的方法基于典型相关分析,该分析找到一对(或多对)原始变量的不相关线性组合,这些组合最能定义多个生物反应和预测变量内部及之间的总体变异性。其有效性通过机器学习模型得到证实,该模型用于估计来自三个营养级别的水生物种中多种有机污染物的急性毒性:藻类(斜生栅藻)、水蚤(大型溞)和鱼类(青鳉)。多物种模型实现了良好的预测性能,与分别为水生生物推导的预测模型一致。化学物质的生物可利用性和反应性参数(正辛醇/水分配系数、化学势以及分子大小和体积)对于准确预测对三种水生生物的急性生态毒性很重要。为便于使用这种方法,提供了一个基于Python的开源脚本,名为qMTM(定量多物种毒性建模)。