Department of Chemistry, University of Turin, Italy; Centro Regionale Antidoping, Orbassano, TO, Italy.
Centro Regionale Antidoping, Orbassano, TO, Italy.
J Pharm Biomed Anal. 2024 Jul 15;244:116113. doi: 10.1016/j.jpba.2024.116113. Epub 2024 Mar 20.
Urinary sex hormones are investigated as potential biomarkers for the early detection of breast cancer, aiming to evaluate their relevance and applicability, in combination with supervised machine-learning data analysis, toward the ultimate goal of extensive screening.
Sex hormones were determined on urine samples collected from 250 post-menopausal women (65 healthy - 185 with breast cancer, recruited among the clinical patients of Candiolo Cancer Institute FPO-IRCCS (Torino, Italy). Two analytical procedures based on UHPLC-MS/HRMS were developed and comprehensively validated to quantify 20 free and conjugated sex hormones from urine samples. The quantitative data were processed by seven machine learning algorithms. The efficiency of the resulting models was compared.
Among the tested models aimed to relate urinary estrogen and androgen levels and the occurrence of breast cancer, Random Forest (RF) proved to underscore all the other supervised classification approaches, including Partial Least Squares - Discriminant Analysis (PLS-DA), in terms of effectiveness and robustness. The final optimized model built on only five biomarkers (testosterone-sulphate, alpha-estradiol, 4-methoxyestradiol, DHEA-sulphate, and epitestosterone-sulphate) achieved an approximate 98% diagnostic accuracy on replicated validation sets. To balance the less-represented population of healthy women, a Synthetic Minority Oversampling TEchnique (SMOTE) data oversampling approach was applied.
By means of tunable hyperparameters optimization, the RF algorithm showed great potential for early breast cancer detection, as it provides clear biomarkers ranking and their relative efficiency, allowing to ground the final diagnostic model on a restricted selection five steroid biomarkers only, as desirable for noninvasive tests with wide screening purposes.
研究尿性激素作为乳腺癌早期检测的潜在生物标志物,旨在评估其相关性和适用性,结合有监督的机器学习数据分析,最终实现广泛筛查的目标。
从 250 名绝经后妇女(65 名健康女性-185 名乳腺癌患者)的尿液样本中测定性激素,这些患者均为意大利都灵坎迪奥洛癌症研究所 FPO-IRCCS(Candiolo Cancer Institute FPO-IRCCS)的临床患者。开发并全面验证了两种基于 UHPLC-MS/HRMS 的分析程序,以定量尿液样本中的 20 种游离和结合性激素。定量数据由七种机器学习算法进行处理。比较了得到的模型的效率。
在所测试的旨在将尿雌激素和雄激素水平与乳腺癌发生相关联的模型中,随机森林(RF)被证明在有效性和稳健性方面优于所有其他有监督分类方法,包括偏最小二乘判别分析(PLS-DA)。最终基于仅 5 种生物标志物(硫酸睾酮、α-雌二醇、4-甲氧基雌二醇、硫酸去氢表雄酮和硫酸表雄酮)构建的优化模型在复制验证集上达到了约 98%的诊断准确性。为了平衡健康女性人数较少的情况,应用了一种合成少数过采样技术(SMOTE)数据过采样方法。
通过可调超参数优化,RF 算法显示出用于早期乳腺癌检测的巨大潜力,因为它提供了明确的生物标志物排名及其相对效率,允许将最终诊断模型建立在仅 5 种类固醇生物标志物的受限选择上,这对于具有广泛筛查目的的非侵入性测试是理想的。