School of Civil Engineering and Environmental Science, University of Oklahoma Norman, OK, 73019, USA.
School of Computer Science, University of Oklahoma Norman, OK, 73019, USA.
Chemosphere. 2021 Jul;275:130124. doi: 10.1016/j.chemosphere.2021.130124. Epub 2021 Feb 26.
This work explores the use of supervised machine learning as a tool for identifying the source of per- and polyfluorinated alkyl substances (PFAS) in water samples on the basis of the detected component concentrations. Specifically, the work focuses on distinguishing between PFAS used in aqueous film forming foam (AFFF) fire suppression applications, and PFAS from other sources. The fact that many sites contaminated with legacy PFOS-based AFFF formulations are dominated by perfluorinated sulfonates can make it tempting to naïvely classify samples dominated by perfluorinated sulfonates as being of AFFF origin. However, a large fraction of samples do not follow this pattern, including some of the most important cases, such as legacy PFOS-based AFFF far from its source. Although PFAS composition can vary substantially at a site as a result of mobility differences between components and other factors, the hypothesis driving the work is that compositional patterns created in the environment can be recognized across different sites by machine learning, and used for source allocation. This work builds on earlier preliminary work by the authors based on a small dataset. This work is based on a much larger 8040-sample dataset, and explores different preprocessing approaches, as well as how feature selection impacts classification performance. The results of this work strongly support the idea that supervised machine learning based on composition can identify patterns that can be used to distinguish PFAS sources. The results provide new insights into selection of classifiers and features for source identification based on PFAS sample composition.
本研究探讨了基于检测到的成分浓度,利用监督机器学习来识别水样中全氟和多氟烷基物质(PFAS)来源的方法。具体而言,本研究侧重于区分用于水成膜泡沫(AFFF)灭火剂中的 PFAS 与其他来源的 PFAS。许多受含 legacy PFOS 的 AFFF 制剂污染的地点主要受全氟磺酸根的影响,这使得人们很容易天真地将主要由全氟磺酸根组成的样品归类为源自 AFFF。然而,很大一部分样品并不遵循这种模式,包括一些最重要的情况,例如远离其来源的 legacy PFOS 基 AFFF。尽管由于成分之间的迁移差异和其他因素,一个地点的 PFAS 组成可能会发生很大变化,但推动这项工作的假设是,环境中形成的成分模式可以通过机器学习在不同地点被识别,并用于源分配。这项工作建立在作者之前基于小数据集的初步研究基础上。本研究基于一个更大的 8040 个样本数据集,探索了不同的预处理方法,以及特征选择对分类性能的影响。这项工作的结果强烈支持了这样一种观点,即基于成分的监督机器学习可以识别出可用于区分 PFAS 来源的模式。研究结果为基于 PFAS 样品成分的源识别选择分类器和特征提供了新的见解。