Rotteveel Adan, Lee Wen-Yee, Kountouri Zoi, Stefanou Nikolas, Kivell Howard, Gluck Clifford, Zhang Shuguang, Mershin Andreas
Hcyon Technology, Amsterdam, North Holland, The Netherlands.
Department of Chemistry and Biochemistry, University of Texas El Paso, El Paso, Texas, United States of America.
PLoS One. 2025 May 30;20(5):e0314742. doi: 10.1371/journal.pone.0314742. eCollection 2025.
Prostate cancer (PCa) is a major, and increasingly global, health concern with current screening and diagnostic tools' severe limitations causing unnecessary, invasive biopsy procedures. While gas chromatography-mass spectrometry (GC-MS) has been used to detect urinary volatile organic compounds (VOCs) associated with PCa, efforts to identify consistent molecular biomarkers have failed to generalize across studies. Inspired by the olfactory diagnostic capabilities of medical detection dogs, we do not reduce chromatograms to a list of compounds and concentrations. Instead, we deploy a machine learning approach that bypasses molecular identification: PCa "scent character" signatures are extracted from raw time series data transformed into image representations for classification via convolutional neural networks. To address confounding factors such as sample-source bias, we implement a multi-step pre-processing and debiasing pipeline, including empirical Bayes correction, baseline drift removal, and domain adversarial learning. The resulting model achieves classification performance on par with similarly trained canines, achieving a recall of 88% and an F1-score of 0.78. These findings demonstrate that, at least in the context of PCa detection from urine, machine learning-based scent signature analysis can serve as a fully non-invasive diagnostic alternative, with these early results being also relevant to the wider emergent field of medical machine olfaction.
前列腺癌(PCa)是一个主要的且日益全球化的健康问题,当前的筛查和诊断工具存在严重局限性,导致不必要的侵入性活检程序。虽然气相色谱 - 质谱联用(GC - MS)已被用于检测与前列腺癌相关的尿液挥发性有机化合物(VOCs),但识别一致的分子生物标志物的努力未能在各项研究中得到普遍应用。受医学检测犬嗅觉诊断能力的启发,我们没有将色谱图简化为化合物和浓度列表。相反,我们采用了一种绕过分子识别的机器学习方法:从转换为图像表示的原始时间序列数据中提取前列腺癌“气味特征”特征,通过卷积神经网络进行分类。为了解决诸如样本来源偏差等混杂因素,我们实施了一个多步骤的预处理和去偏管道,包括经验贝叶斯校正、基线漂移去除和域对抗学习。所得模型的分类性能与经过类似训练的犬类相当,召回率达到88%,F1分数为0.78。这些发现表明,至少在从尿液中检测前列腺癌的背景下,基于机器学习的气味特征分析可以作为一种完全非侵入性的诊断替代方法,这些早期结果也与更广泛的新兴医学机器嗅觉领域相关。