Thorson Jacob, Collier-Oxandale Ashley, Hannigan Michael
Mechanical Engineering, University of Colorado Boulder, Boulder, CO 80309, USA.
Environmental Engineering, University of Colorado, Boulder, Boulder, CO 80309, USA.
Sensors (Basel). 2019 Aug 28;19(17):3723. doi: 10.3390/s19173723.
An array of low-cost sensors was assembled and tested in a chamber environment wherein several pollutant mixtures were generated. The four classes of sources that were simulated were mobile emissions, biomass burning, natural gas emissions, and gasoline vapors. A two-step regression and classification method was developed and applied to the sensor data from this array. We first applied regression models to estimate the concentrations of several compounds and then classification models trained to use those estimates to identify the presence of each of those sources. The regression models that were used included forms of multiple linear regression, random forests, Gaussian process regression, and neural networks. The regression models with human-interpretable outputs were investigated to understand the utility of each sensor signal. The classification models that were trained included logistic regression, random forests, support vector machines, and neural networks. The best combination of models was determined by maximizing the F score on ten-fold cross-validation data. The highest F score, as calculated on testing data, was 0.72 and was produced by the combination of a multiple linear regression model utilizing the full array of sensors and a random forest classification model.
组装了一组低成本传感器,并在一个能产生多种污染物混合物的室内环境中进行测试。模拟的四类污染源分别是移动排放源、生物质燃烧源、天然气排放源和汽油蒸汽源。开发了一种两步回归和分类方法,并将其应用于该传感器阵列的数据。我们首先应用回归模型来估算几种化合物的浓度,然后使用经过训练的分类模型,利用这些估算值来识别每种污染源的存在。所使用的回归模型包括多元线性回归、随机森林、高斯过程回归和神经网络等形式。对具有人类可解释输出的回归模型进行了研究,以了解每个传感器信号的效用。所训练的分类模型包括逻辑回归、随机森林、支持向量机和神经网络。通过在十倍交叉验证数据上最大化F分数来确定模型的最佳组合。在测试数据上计算出的最高F分数为0.72,它是由利用整个传感器阵列的多元线性回归模型和随机森林分类模型的组合产生的。