Computational Earth Science Group, Earth and Environmental Sciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
Physics and Chemistry of Materials Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
J Contam Hydrol. 2019 Jan;220:66-97. doi: 10.1016/j.jconhyd.2018.11.010. Epub 2018 Dec 4.
Unsupervised Machine Learning (ML) is becoming increasingly popular for solving various types of data analytics problems including feature extraction, blind source separation, exploratory analyses, model diagnostics, etc. Here, we have developed a new unsupervised ML method based on Nonnegative Tensor Factorization (NTF) for identification of the original groundwater types (including contaminant sources) present in geochemical mixtures observed in an aquifer. Frequently, groundwater types with different geochemical signatures are related to different background and/or contamination sources. The characterization of groundwater mixing processes is a challenging but very important task critical for any environmental management project aiming to characterize the fate and transport of contaminants in the subsurface and perform contaminant remediation. This task typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. Additionally, the application of inverse methods may introduce biases in the analyses through the assumptions made in the model development process. Here, we substitute the model inversion with unsupervised ML analysis. The ML analysis does not make any assumptions about underlying physical and geochemical processes occurring in the aquifer. Our ML methodology, called NTFk, is capable of identifying (1) the unknown number of groundwater types (contaminant sources) present in the aquifer, (2) the original geochemical concentrations (signatures) of these groundwater types and (3) spatial and temporal dynamics in the mixing of these groundwater types. These results are obtained only from the measured geochemical data without any additional site information. In general, the NTFk methodology allows for interpretation of large high-dimensional datasets representing diverse spatial and temporal components such as state variables and velocities. NTFk has been tested on synthetic and real-world site three-dimensional datasets. The NTFk algorithm is designed to work with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).
无监督机器学习 (ML) 正越来越多地用于解决各种类型的数据分析问题,包括特征提取、盲源分离、探索性分析、模型诊断等。在这里,我们开发了一种基于非负张量分解 (NTF) 的新无监督 ML 方法,用于识别含水层中观测到的地球化学混合物中存在的原始地下水类型(包括污染源)。通常,具有不同地球化学特征的地下水类型与不同的背景和/或污染源有关。地下水混合过程的特征描述是一项具有挑战性但非常重要的任务,对于任何旨在描述污染物在地下水中的运移和归宿并进行污染物修复的环境管理项目至关重要。这项任务通常需要解决代表含水层中地下水流动和地球化学输运的复杂逆模型,其中逆分析考虑了可用的现场数据。通常,模型是根据描述观测到的地球化学类型时空分布的可用数据进行校准的。在这些模型中,可能需要模拟许多不同的地球化学成分和过程,这进一步增加了分析的复杂性。此外,逆方法的应用可能会通过模型开发过程中的假设在分析中引入偏差。在这里,我们用无监督 ML 分析代替模型反演。ML 分析不做任何关于含水层中发生的潜在物理和地球化学过程的假设。我们的 ML 方法学称为 NTFk,能够识别 (1) 含水层中存在的未知数量的地下水类型(污染源),(2) 这些地下水类型的原始地球化学浓度(特征),以及 (3) 这些地下水类型混合的时空动态。这些结果仅从测量的地球化学数据中获得,而无需任何其他现场信息。一般来说,NTFk 方法学允许解释代表不同时空成分(例如状态变量和速度)的大型高维数据集。NTFk 已经在合成和真实站点三维数据集上进行了测试。NTFk 算法旨在处理以浓度、比(两种成分的比;例如,同位素比)和 delta 符号(标准化稳定同位素比)形式表示的地球化学数据。