McCarthy Ryan A, Gupta Ananya Sen, Kubicek Bernice, Awad Andrew M, Martinez Andres, Marek Rachel F, Hornbuckle Keri C
Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA 52242 USA.
Department of Civil and Environmental Engineering and IIHR-Hydroscience and Engineering, University of Iowa, Iowa City, IA 52242 USA.
IEEE Access. 2020;8:147738-147755. doi: 10.1109/ACCESS.2020.3013108. Epub 2020 Aug 14.
The main contribution of this interdisciplinary work is a robust computational framework to autonomously discover and quantify previously unknown associations between well-known (target) and potentially unknown (non-target) toxic industrial air pollutants. In this work, the variability of polychlorinated biphenyl (PCB) data is evaluated using a combination of statistical, signal processing, and graph-based informatics techniques to interpret the raw instrument signal from gas chromatography-mass spectrometry (GC/MS/MS) data sets. Specifically, minimum mean-squared techniques from the adaptive signal processing literature are extended to detect and separate coeluted (overlapped) peaks in the raw instrument signal. A graph-based visualization is provided which bridges two complementary approaches to quantitative pollution studies: (i) peak-cognizant target analysis (limits data analysis to few well-known compounds) and (ii) chemometric analysis (statistical large-scale data analysis) that is agnostic of specific compounds. Further, peak fitting techniques based on L2 error minimization are employed to autonomously calculate the amount of each PCB present with a normalized mean square error of -18.4851 dB. Graph-based visualization of associations between known and unknown compounds are developed through principal component analysis and both fuzzy c-means (FCM) and k-means clustering techniques are implemented and compared. The efficiency of these methods are compared using 150 air samples analyzed for individual PCBs with GC/MS/MS against traditional target-only techniques that perform analysis across only the known (target) PCBs. Parameter optimization techniques are employed to evaluate the relative contribution of PCB signals against ten potential source signals representing legacy signatures from historical manufacture of Aroclors and modern sources of PCBs produced as by products of pigment and polymer manufacturing. Aroclors 1232, 1254, 1016, and 1221 as well as non-Aroclor 3, 3', dichlorobiphenyl (PCB 11) were found in many of the samples as unique source signals that describe PCB mixtures in air samples collected from Chicago, IL.
这项跨学科工作的主要贡献是一个强大的计算框架,用于自主发现和量化知名(目标)和潜在未知(非目标)有毒工业空气污染物之间以前未知的关联。在这项工作中,使用统计、信号处理和基于图形的信息学技术的组合来评估多氯联苯(PCB)数据的变异性,以解释来自气相色谱 - 质谱联用(GC/MS/MS)数据集的原始仪器信号。具体而言,来自自适应信号处理文献的最小均方技术被扩展,以检测和分离原始仪器信号中同时洗脱(重叠)的峰。提供了一种基于图形的可视化方法,它连接了定量污染研究的两种互补方法:(i)峰识别目标分析(将数据分析限制在少数知名化合物上)和(ii)化学计量分析(统计大规模数据分析),后者对特定化合物不敏感。此外,采用基于L2误差最小化的峰拟合技术,以归一化均方误差-18.4851 dB自主计算每种PCB的含量。通过主成分分析开发了已知和未知化合物之间关联的基于图形的可视化方法,并实施和比较了模糊c均值(FCM)和k均值聚类技术。使用通过GC/MS/MS分析单个PCB的150个空气样本,将这些方法的效率与仅对已知(目标)PCB进行分析的传统仅目标技术进行比较。采用参数优化技术来评估PCB信号相对于代表来自Aroclors历史制造遗留特征的十个潜在源信号以及作为颜料和聚合物制造副产品产生的PCB现代源的相对贡献。在许多样本中发现了Aroclors 1232、1254、1016和1221以及非Aroclor 3,3' - 二氯联苯(PCB 11)作为独特的源信号,这些信号描述了从伊利诺伊州芝加哥收集的空气样本中的PCB混合物。