Carapito Â, Fernandes Ferreira V S, Silva Ferreira A C, Teixeira-Marques A, Henrique R, Jerónimo C, Roque A C A, Carvalho F, Pinto J, Guedes de Pinho P
Associate Laboratory i4HB - Institute for Health and Bioeconomy, University of Porto, Porto, Portugal; UCIBIO - Applied Molecular Biosciences Unit, Laboratory of Toxicology, Faculty of Pharmacy, University of Porto, Porto, Portugal.
Cork Supply Portugal, S.A., Sao Paio de Oleiros, Portugal.
Talanta. 2025 Aug 26;297(Pt B):128749. doi: 10.1016/j.talanta.2025.128749.
Early detection of bladder cancer (BC) remains a major clinical challenge due to the limitations of current diagnostic methods, which are often invasive, expensive, or insufficiently sensitive, particularly for early-stage disease. Metabolomics approaches, when integrated with machine learning (ML) techniques, offer a powerful platform for identifying novel, non-invasive biomarkers. In this study, urinary volatile organic compounds (VOCs) were analysed from 87 BC patients and 90 age- and sex-matched cancer-free controls using headspace solid-phase microextraction coupled with gas chromatography-mass spectrometry (HS-SPME/GC-MS). Of the 90 VOCs identified, 27 were selected and used to train five ML algorithms-random forest (RF), support vector machine (SVM), partial least squares-discriminant analysis (PLS-DA), extreme gradient boosting (XGBoost), and k-nearest neighbors (k-NN). Model performance was evaluated using cross-validation and an independent validation set, with metrics including area under the curve (AUC), sensitivity, specificity, and accuracy. RF achieved the highest performance using all 27 features (AUC = 0.913; sensitivity, specificity, and accuracy = 85 %). After feature selection, an eight-VOC panel improved performance on the validation set (AUC = 0.872; sensitivity = 89 %, specificity = 92 %, accuracy = 91 %). The panel included ketones, aldehydes, a short fatty alcohol, and a phenol compound-seven elevated in BC, and one (acetone) decreased. This panel outperformed FDA-approved urinary assays and closely matched the specificity of urine cytology. These findings underscore the promise of VOC-based urinary biomarkers, in combination with ML, for the non-invasive detection of BC. Further large-scale validation studies are essential to confirm diagnostic utility and enable clinical translation.
由于当前诊断方法存在局限性,膀胱癌(BC)的早期检测仍然是一项重大临床挑战。这些方法往往具有侵入性、成本高或灵敏度不足的问题,尤其是对于早期疾病。代谢组学方法与机器学习(ML)技术相结合,为识别新型非侵入性生物标志物提供了一个强大的平台。在本研究中,使用顶空固相微萃取结合气相色谱 - 质谱联用(HS - SPME/GC - MS)对87例膀胱癌患者和90例年龄及性别匹配的无癌对照者的尿液挥发性有机化合物(VOCs)进行了分析。在鉴定出的90种VOCs中,选择了27种用于训练五种ML算法——随机森林(RF)、支持向量机(SVM)、偏最小二乘判别分析(PLS - DA)、极端梯度提升(XGBoost)和k近邻(k - NN)。使用交叉验证和独立验证集评估模型性能,指标包括曲线下面积(AUC)、灵敏度、特异性和准确性。RF使用全部27个特征时性能最高(AUC = 0.913;灵敏度、特异性和准确性 = 85%)。经过特征选择后,一个由8种VOC组成的面板在验证集上提高了性能(AUC = 0.872;灵敏度 = 89%,特异性 = 92%,准确性 = 91%)。该面板包括酮类、醛类、一种短链脂肪醇和一种酚类化合物——其中七种在膀胱癌中升高,一种(丙酮)降低。该面板的表现优于美国食品药品监督管理局(FDA)批准的尿液检测方法,并且与尿液细胞学的特异性相近。这些发现强调了基于VOC的尿液生物标志物与ML相结合用于膀胱癌非侵入性检测的前景。进一步的大规模验证研究对于确认诊断效用并实现临床转化至关重要。