McCarthy Ryan A, Sen Gupta Ananya
Department of Electrical and Computer Engineering, The University of Iowa, Iowa City, IA 52242, USA.
Iowa Technology Institute (ITI), The University of Iowa, Iowa City, IA 52242, USA.
IEEE Access. 2021;9:24727-24737. doi: 10.1109/access.2021.3056955. Epub 2021 Feb 10.
The aim of this interdisciplinary work is a robust signal processing and autonomous machine learning framework to associate well-known (target) as well as any potentially unknown (non-target) peaks present within gas chromatography-mass spectrometry (GC/MS/MS) raw instrument signal. Particularly, this work evaluates three machine learning algorithms abilities to autonomously associate raw signal peaks based on accuracy in training and testing. A target is a known congener that is expected to be present within the raw instrument signal and a non-target is an unknown or unexpected compound. Autonomously identifying target peaks within the GC/MS/MS and associating them with non-target peaks can help improve the analysis of collected samples. Association of peaks refers to classifying peaks as known congeners regardless if the peak is a target or non-target. Uncertainty of peaks fitted and discovered through raw instrument signals from GC/MS/MS data is assessed to create topographical illustrations of target annotated peaks among sample raw instrument signals collected across diverse locations in the Chicago area. The term "annotated peak" is used to assign peaks found at specific retention times as a known congener. Adaptive signal processing techniques are utilized to smooth data and correct baseline drifts as well as detect and separate coeluted (overlapped) peaks in the raw instrument signal to provide key feature extraction. 150 air samples are analyzed for individual polychlorinated biphenyls (PCB) with GC/MS/MS across Chicago, IL. 80% of the data is used for training classification of target PCBs and 20% of the data is evaluated to identify and associate consistently occurring non-target peaks with target PCBs. A random forest classifier is used to associate identified peaks to target PCB peaks. Geographical topographical representations of target PCBs in the raw instrument signal demonstrates how PCBs accumulate and degrade in certain locations.
这项跨学科工作的目标是建立一个强大的信号处理和自主机器学习框架,以关联气相色谱-质谱联用仪(GC/MS/MS)原始仪器信号中存在的已知(目标)以及任何潜在未知(非目标)峰。具体而言,这项工作评估了三种机器学习算法基于训练和测试准确性自主关联原始信号峰的能力。目标是预期存在于原始仪器信号中的已知同系物,非目标是未知或意外的化合物。在GC/MS/MS中自主识别目标峰并将它们与非目标峰关联起来有助于改进对采集样本的分析。峰的关联是指将峰分类为已知同系物,无论该峰是目标峰还是非目标峰。通过GC/MS/MS数据的原始仪器信号拟合和发现的峰的不确定性被评估,以创建在芝加哥地区不同地点采集的样本原始仪器信号中目标注释峰的地形图。术语“注释峰”用于将在特定保留时间发现的峰指定为已知同系物。利用自适应信号处理技术来平滑数据、校正基线漂移以及检测和分离原始仪器信号中同时洗脱(重叠)的峰,以提供关键特征提取。使用GC/MS/MS对伊利诺伊州芝加哥市的150个空气样本进行单个多氯联苯(PCB)分析。80%的数据用于训练目标PCB的分类,20%的数据用于评估,以识别持续出现的非目标峰并将它们与目标PCB关联起来。使用随机森林分类器将识别出的峰与目标PCB峰关联起来。原始仪器信号中目标PCB的地理地形图展示了PCB在某些地点的积累和降解情况。