Suppr超能文献

使用深度学习评估色谱数据中的峰。

Using deep learning to evaluate peaks in chromatographic data.

作者信息

Risum Anne Bech, Bro Rasmus

机构信息

Department of Food Science, University of Copenhagen, Denmark.

Department of Food Science, University of Copenhagen, Denmark.

出版信息

Talanta. 2019 Nov 1;204:255-260. doi: 10.1016/j.talanta.2019.05.053. Epub 2019 May 22.

Abstract

Analysis of untargeted gas-chromatographic data is time consuming. With the earlier introduction of the PARAFAC2 (PARAllel FACtor analysis 2) based PARADISe (PARAFAC2 based Deconvolution and Identification System) approach in 2017, this task was made considerably more time-efficient. However, there are still a number of manual steps in the analysis which require data analytical expertise. One of these is the need to define whether or not each PARAFAC2 resolved component represents a peak suitable for integration. As the peaks may change in both shape and location on the elution time-axis, this presents a problem which cannot be readily solved by applying a linear classifier, such as PLS-DA (Partial Least Squares regression for Discriminant Analysis). As part of our ongoing efforts to further automate analysis of Gas Chromatography with Mass Spectrometry (GC-MS), we therefore explore a convolutional neural network classifier, capable of handling these shifts and variations in shape. The theory of convolutional neural networks and application on vector samples is briefly explained, and the performance is tested against a PLS-DA classifier, a shallow artificial neural network and a locally weighted regression model. The models are built on a training set with PARAFAC2 resolved components from eight different aroma related GC-MS runs with a total of over 70,000 elution profile samples, and validated using another, independent, GC-MS dataset. Based on Receiver Operating Characteristic curves (ROC) and manual analysis of the misclassified cases, it is shown that the convolutional network consistently outperforms the competing models, yielding an Area Under the Curve (AUC) value of 0.95 for peak classification. Examples are given illustrating that this new approach provides convincing means to automatically assess and evaluate modelled elution profiles of chromatographic data and thereby remove this laborious manual step.

摘要

非靶向气相色谱数据的分析耗时较长。随着2017年基于PARAFAC2(平行因子分析2)的PARADISe(基于PARAFAC2的去卷积和识别系统)方法的较早引入,这项任务的时间效率有了显著提高。然而,分析过程中仍有许多手动步骤,需要数据分析专业知识。其中之一是需要确定每个PARAFAC2解析的组分是否代表适合积分的峰。由于峰在洗脱时间轴上的形状和位置可能会发生变化,这就带来了一个问题,即应用线性分类器(如PLS-DA,偏最小二乘判别分析)无法轻易解决。因此,作为我们进一步实现气相色谱-质谱联用(GC-MS)分析自动化的持续努力的一部分,我们探索了一种卷积神经网络分类器,它能够处理这些形状上的变化和偏移。简要解释了卷积神经网络的理论及其在向量样本上的应用,并与PLS-DA分类器、浅层人工神经网络和局部加权回归模型进行了性能测试。这些模型基于一个训练集构建,该训练集包含来自八个不同的与香气相关的GC-MS运行的PARAFAC2解析组分,共有超过70000个洗脱谱样本,并使用另一个独立的GC-MS数据集进行验证。基于受试者工作特征曲线(ROC)以及对误分类案例的人工分析,结果表明卷积网络始终优于竞争模型,在峰分类方面的曲线下面积(AUC)值为0.95。文中给出了示例,说明这种新方法为自动评估和评价色谱数据的模拟洗脱谱提供了令人信服的手段,从而消除了这一繁琐的手动步骤。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验