Suppr超能文献

湖泊中溶解有机碳的估算:通过多传感器遥感观测融合的数据增强改进机器学习

Dissolved organic carbon estimation in lakes: Improving machine learning with data augmentation on fusion of multi-sensor remote sensing observations.

作者信息

Asadollah Seyed Babak Haji Seyed, Safaeinia Ahmadreza, Jarahizadeh Sina, Alcalá Francisco Javier, Sharafati Ahmad, Jodar-Abellan Antonio

机构信息

Department of Environmental Resources Engineering, State University of New York, College of Environmental Science and Forestry, 1 Forestry Drive, Syracuse, NY 13210, USA; Department of Civil Engineering, University of Alicante, 03690 Alicante, Spain.

Department of Environmental Resources Engineering, State University of New York, College of Environmental Science and Forestry, 1 Forestry Drive, Syracuse, NY 13210, USA.

出版信息

Water Res. 2025 Jun 1;277:123350. doi: 10.1016/j.watres.2025.123350. Epub 2025 Feb 21.

Abstract

This paper presents a novel approach for estimating Dissolved Organic Carbon (DOC) concentrations in lakes considering both carbon sources and sink operators. Despite the critical role of DOC, the combined application of machine learning, as a robust predictor, and remote sensing technology, which reduces costly and time-intensive in-situ sampling, has been underexplored in DOC research. Focusing on lakes over the states of New York, Vermont and Maine (United States, U.S.), this study integrates in-situ DOC measurements with surface reflectance bands obtained from Landsat satellites between 2000 and 2020. Using these bands as inputs of the Random Forest (RF) predictive model, the introduced methodology aims to explore the ability of remote sensing data for large-scale DOC simulation. Initial results indicate low accuracy metrics and significant under-estimation due to the imbalance distribution of DOC samples. Statistical analysis showed that the mean DOC concentration was 5.37±3.37 mg/L (mean±one standard deviation), with peak up to 25 mg/L. A highly skewed distribution of chemical components towards the lower ranges can lead to model misrepresentation of extreme and hazardous events, as they are clouded by unimportant events due to significantly lower occurrence rates. To address this issue, the Synthetic Minority Over-sampling Technique (SMOTE) was applied as a key innovation, generating synthetic samples that enhance RF accuracy and reduce the associated errors. Fusion of in-situ and remote sensing data, combined with machine learning and data augmentation, significantly enhances DOC estimation accuracy, especially in high concentration ranges which are critical for environmental health. With prediction metrics of RMSE = 1.75, MAE = 1.09, and R = 0.74, RF-SMOTE significantly improve the metrics obtained from stand-alone RF, particularly in estimating high DOC concentrations. Considering the product spatial resolution of 30 m, the model's output provides potential revenue for global application in lake monitoring, even in remote regions where direct sampling is limited. This novel fusion of remote sensing, machine learning and data augmentation offers valuable insights for water quality management and understanding carbon cycling in aquatic ecosystems.

摘要

本文提出了一种考虑碳源和碳汇作用来估算湖泊中溶解有机碳(DOC)浓度的新方法。尽管DOC起着关键作用,但作为一种强大预测工具的机器学习与能减少昂贵且耗时的现场采样的遥感技术在DOC研究中的联合应用却尚未得到充分探索。本研究聚焦于美国纽约州、佛蒙特州和缅因州的湖泊,将2000年至2020年间现场DOC测量数据与从陆地卫星获取的地表反射波段数据进行整合。利用这些波段作为随机森林(RF)预测模型的输入,所引入的方法旨在探索遥感数据用于大规模DOC模拟的能力。初步结果表明,由于DOC样本分布不均衡,精度指标较低且存在显著低估。统计分析表明,DOC平均浓度为5.37±3.37毫克/升(平均值±一个标准差),峰值高达25毫克/升。化学成分向较低范围的高度偏态分布可能导致模型对极端和危险事件的误判,因为这些事件因发生率显著较低而被不重要的事件所掩盖。为解决这一问题,合成少数类过采样技术(SMOTE)作为一项关键创新被应用,生成增强RF精度并减少相关误差的合成样本。现场数据与遥感数据的融合,结合机器学习和数据增强,显著提高了DOC估算精度,尤其是在对环境健康至关重要的高浓度范围内。RF - SMOTE的预测指标为均方根误差(RMSE)= 1.75、平均绝对误差(MAE)= 1.09和相关系数(R)= 0.74,显著改善了独立RF模型所获得的指标,特别是在估算高DOC浓度时。考虑到产品空间分辨率为30米,该模型的输出为湖泊监测的全球应用提供了潜在收益,即使在直接采样受限的偏远地区也是如此。这种遥感、机器学习和数据增强的新型融合为水质管理以及理解水生生态系统中的碳循环提供了有价值的见解。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验