Suppr超能文献

运用超统计方法和机器学习分析泰晤士河溶解氧的时空动态。

Analyzing spatio-temporal dynamics of dissolved oxygen for the River Thames using superstatistical methods and machine learning.

作者信息

He Hankun, Boehringer Takuya, Schäfer Benjamin, Heppell Kate, Beck Christian

机构信息

Centre for Complex Systems, Queen Mary University of London, London, UK.

University College London, London, UK.

出版信息

Sci Rep. 2024 Sep 12;14(1):21288. doi: 10.1038/s41598-024-72084-w.

Abstract

By employing superstatistical methods and machine learning, we analyze time series data of water quality indicators for the River Thames (UK). The indicators analyzed include dissolved oxygen, temperature, electrical conductivity, pH, ammonium, turbidity, and rainfall, with a specific focus on the dynamics of dissolved oxygen. After detrending, the probability density functions of dissolved oxygen fluctuations exhibit heavy tails that are effectively modeled using q-Gaussian distributions. Our findings indicate that the multiplicative Empirical Mode Decomposition method stands out as the most effective detrending technique, yielding the highest log-likelihood in nearly all fittings. We also observe that the optimally fitted width parameter of the q-Gaussian shows a negative correlation with the distance to the sea, highlighting the influence of geographical factors on water quality dynamics. In the context of same-time prediction of dissolved oxygen, regression analysis incorporating various water quality indicators and temporal features identify the Light Gradient Boosting Machine as the best model. SHapley Additive exPlanations reveal that temperature, pH, and time of year play crucial roles in the predictions. Furthermore, we use the Transformer, a state-of-the-art machine learning model, to forecast dissolved oxygen concentrations. For long-term forecasting, the Informer model consistently delivers superior performance, achieving the lowest Mean Absolute Error (0.15) and Symmetric Mean Absolute Percentage Error (21.96%) with the 192 historical time steps that we used. This performance is attributed to the Informer's ProbSparse self-attention mechanism, which allows it to capture long-range dependencies in time-series data more effectively than other machine learning models. It effectively recognizes the half-life cycle of dissolved oxygen, with particular attention to critical periods such as morning to early afternoon, late evening to early morning, and key intervals between the 16th and 26th quarter-hours of the previous half-day. Our findings provide valuable insights for policymakers involved in ecological health assessments, aiding in accurate predictions of river water quality and the maintenance of healthy aquatic ecosystems.

摘要

通过运用超统计学方法和机器学习,我们分析了英国泰晤士河水质指标的时间序列数据。所分析的指标包括溶解氧、温度、电导率、pH值、铵、浊度和降雨量,特别关注溶解氧的动态变化。在去除趋势后,溶解氧波动的概率密度函数呈现出重尾特征,使用q-高斯分布可以有效地对其进行建模。我们的研究结果表明,乘法经验模态分解方法是最有效的去趋势技术,在几乎所有拟合中都产生了最高的对数似然值。我们还观察到,q-高斯的最优拟合宽度参数与到海的距离呈负相关,突出了地理因素对水质动态的影响。在溶解氧的同期预测方面,结合各种水质指标和时间特征的回归分析确定了轻梯度提升机为最佳模型。SHapley加法解释表明,温度、pH值和一年中的时间在预测中起着关键作用。此外,我们使用最先进的机器学习模型Transformer来预测溶解氧浓度。对于长期预测,Informer模型始终表现出卓越的性能,在我们使用的192个历史时间步长上,实现了最低的平均绝对误差(0.15)和对称平均绝对百分比误差(21.96%)。这种性能归因于Informer的概率稀疏自注意力机制,它能够比其他机器学习模型更有效地捕捉时间序列数据中的长期依赖性。它有效地识别了溶解氧的半衰期周期,特别关注关键时期,如上午到下午早些时候、深夜到凌晨以及前半天第16至26个一刻钟之间的关键间隔。我们的研究结果为参与生态健康评估的政策制定者提供了有价值的见解,有助于准确预测河流水质并维护健康的水生生态系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea9c/11393100/b14c0e593897/41598_2024_72084_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验