Computer Science Department, School of Science, Loughborough University, Loughborough, United Kingdom.
Centre for Analytical Science, School of Science, Loughborough University, Loughborough, United Kingdom.
PLoS One. 2022 Apr 12;17(4):e0265399. doi: 10.1371/journal.pone.0265399. eCollection 2022.
Volatile organic compounds (VOCs) in human breath can reveal a large spectrum of health conditions and can be used for fast, accurate and non-invasive diagnostics. Gas chromatography-mass spectrometry (GC-MS) is used to measure VOCs, but its application is limited by expert-driven data analysis that is time-consuming, subjective and may introduce errors. We propose a machine learning-based system to perform GC-MS data analysis that exploits deep learning pattern recognition ability to learn and automatically detect VOCs directly from raw data, thus bypassing expert-led processing. We evaluate this new approach on clinical samples and with four types of convolutional neural networks (CNNs): VGG16, VGG-like, densely connected and residual CNNs. The proposed machine learning methods showed to outperform the expert-led analysis by detecting a significantly higher number of VOCs in just a fraction of time while maintaining high specificity. These results suggest that the proposed novel approach can help the large-scale deployment of breath-based diagnosis by reducing time and cost, and increasing accuracy and consistency.
人体呼吸中的挥发性有机化合物 (VOCs) 可以揭示出大范围的健康状况,可用于快速、准确和非侵入性诊断。气相色谱-质谱联用 (GC-MS) 用于测量 VOCs,但由于专家驱动的数据分析耗时、主观且可能引入误差,其应用受到限制。我们提出了一种基于机器学习的系统来进行 GC-MS 数据分析,该系统利用深度学习模式识别能力,直接从原始数据中学习和自动检测 VOCs,从而绕过专家主导的处理。我们在临床样本上评估了这种新方法,并使用了四种卷积神经网络 (CNNs):VGG16、VGG 类、密集连接和残差 CNNs。研究结果表明,与专家主导的分析相比,所提出的机器学习方法通过在极短的时间内检测到数量显著更多的 VOCs,同时保持高特异性,从而表现出更好的性能。这些结果表明,通过减少时间和成本,提高准确性和一致性,所提出的新方法可以帮助大规模部署基于呼吸的诊断。