Suppr超能文献

整合血浆蛋白质组学和机器学习进行前列腺癌早期风险预测的前瞻性队列研究。

Prospective cohort study integrating plasma proteomics and machine learning for early risk prediction of prostate cancer.

作者信息

Chen Yongming, Long Tianxin, Wang Miao, Liu Shengjie, Lv Zhengtong, Jiang Yuxiao, Hou Huimin, Liu Ming

机构信息

Beijing Hospital, National Center of Gerontology, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.

State Key Laboratory of Cardiovascular Disease, Department of Cardiology, Fuwai Hospital, National Center for Cardiovascular Disease, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China.

出版信息

Int J Surg. 2025 Sep 1;111(9):6123-6134. doi: 10.1097/JS9.0000000000002805. Epub 2025 Jun 28.

Abstract

BACKGROUND

Early detection of prostate cancer (PCa) remains a clinical challenge. Plasma proteomics provides a non-invasive tool for identifying individuals at elevated risk prior to symptom onset or PSA elevation.

METHODS

We quantified 1463 plasma proteins in 23 825 PCa-free men from the UK Biobank (UKB). Participants were split into training and validation sets. Cox regression and Light Gradient Boosting Machine (LightGBM) with forward feature selection were used to identify and rank predictive proteins. Model performance was assessed by area under the receiver operating characteristic curve (AUC) in the validation set, and SHAP values were used to interpret feature contributions.

RESULTS

TSPAN1 and GP2 consistently ranked as top predictors across all analyses. In the training set, both proteins remained significantly associated with PCa risk after Bonferroni correction in multivariable Cox models. LightGBM with forward selection further prioritized TSPAN1 and GP2 as key contributors, and SHAP analysis confirmed their dominant importance. In the validation set, a model combining TSPAN1, GP2, and demographic variables achieved an AUC of 0.728 for overall PCa prediction and 0.760 for 5-year risk. Based on Youden Index-derived thresholds, high-expression groups of TSPAN1 and GP2 were associated with hazard ratios of 1.75 and 1.60, respectively. Longitudinal profiling showed that TSPAN1 levels began rising approximately 9 years before diagnosis, while GP2 increased from 6 years prior.

CONCLUSIONS

TSPAN1 and GP2 are promising long-term predictive biomarkers for PCa. A streamlined proteomics-based model may enable individualized risk stratification and inform earlier, less invasive screening strategies.

摘要

背景

前列腺癌(PCa)的早期检测仍然是一项临床挑战。血浆蛋白质组学为在症状出现或前列腺特异性抗原(PSA)升高之前识别高危个体提供了一种非侵入性工具。

方法

我们对来自英国生物银行(UKB)的23825名无PCa男性的1463种血浆蛋白进行了定量分析。参与者被分为训练集和验证集。使用Cox回归和带有前向特征选择的轻梯度提升机(LightGBM)来识别和排列预测蛋白。通过验证集中受试者操作特征曲线下面积(AUC)评估模型性能,并使用SHAP值来解释特征贡献。

结果

在所有分析中,四跨膜蛋白1(TSPAN1)和糖蛋白2(GP2)始终位列顶级预测因子。在训练集中,在多变量Cox模型中经Bonferroni校正后,这两种蛋白仍与PCa风险显著相关。带有前向选择的LightGBM进一步将TSPAN1和GP2列为关键贡献因子,SHAP分析证实了它们的主导重要性。在验证集中,一个结合了TSPAN1、GP2和人口统计学变量的模型在总体PCa预测中的AUC为0.728,在5年风险预测中的AUC为0.760。基于约登指数得出的阈值,TSPAN1和GP2的高表达组分别与风险比1.75和1.60相关。纵向分析表明,TSPAN1水平在诊断前约9年开始上升,而GP2从诊断前6年开始升高。

结论

TSPAN1和GP2是有前景的PCa长期预测生物标志物。一个简化的基于蛋白质组学的模型可能实现个性化风险分层,并为更早、侵入性更小的筛查策略提供依据。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验