Suppr超能文献

基于临床数据的机器学习聚类预测三阴性乳腺癌患者的总生存期和无复发生存期

Prediction of Overall and Relapse-Free Survival in Triple-Negative Breast Cancer Patients Through Machine Learning-Based Clustering on Clinical Data.

作者信息

Alzate-Granados Juan Pablo, Niño Luis Fernando

机构信息

Universidad Nacional de Colombia - Sede Bogotá, Facultad de Medicina - Depto. de Patología. Doctorado en Oncología, Bogotá, Colombia.

Universidad Nacional de Colombia - Sede Bogotá, Facultad de Ingeniería - Depto. de Ingeniería de Sistemas e Industrial. Grupo de Investigación LISI, Bogotá, Colombia.

出版信息

Clin Breast Cancer. 2025 Oct;25(7):714-719. doi: 10.1016/j.clbc.2025.07.027. Epub 2025 Jul 29.

Abstract

INTRODUCTION

Triple-negative breast cancer (TNBC) accounts for 15% to 20% of breast cancer cases and is characterized by its aggressiveness and high relapse rate. Due to the absence of hormonal receptors and HER2, standard treatment relies on chemotherapy, yielding limited outcomes in overall survival (OS) and relapse-free survival (RFS). The molecular heterogeneity of TNBC complicates risk stratification and personalized treatment approaches. In this context, unsupervised machine learning could improve the identification of clinically homogeneous subgroups and facilitate prognostic predictions.

OBJECTIVE

To develop predictive models for OS and RFS in TNBC patients using machine learning algorithms, specifically k-prototypes for subgroup identification and random forest for outcome prediction.

METHODS

A retrospective cohort study was conducted on 4808 TNBC patients diagnosed between 2012 and 2024. Clinical, demographic, and biomolecular variables were analyzed from anonymized clinical records. The k-prototypes algorithm was applied to cluster patients into groups based on shared characteristics. Subsequently, predictive models using random forest were trained and evaluated through stratified cross-validation and metrics such as AUC, sensitivity, and specificity. Cox regression was used to identify risk factors associated with mortality and relapse.

RESULTS

Four clusters with distinct risk profiles were identified. Overall mortality was 28.8%, and relapse occurred in 40.9%, with a median follow-up time of 8.46 years. The highest-risk group exhibited a mortality rate of 42.3% and a relapse rate of 54.2%, associated with poorer functional status (ECOG ≥3) and a high prevalence of BRCA1/2 mutations (71%). The random forest model achieved 80% accuracy in mortality prediction (AUC = 0.78) and 75% accuracy in relapse prediction (AUC = 0.76). Factors such as the Charlson Comorbidity Index, ECOG, BRCA1/2 status, and PD-L1 expression were key determinants in outcome prediction.

DISCUSSION

The findings confirm the relevance of machine learning in TNBC stratification. A clinically meaningful classification was achieved, outperforming traditional models based solely on clinical or genomic variables. Comorbid burden and tumor biomarkers played crucial roles in outcome prediction. Despite its strengths, the study has limitations, including its retrospective nature and the absence of transcriptomic data. Prospective validation of these models could enhance their applicability in clinical practice.

摘要

引言

三阴性乳腺癌(TNBC)占乳腺癌病例的15%至20%,其特点是侵袭性强且复发率高。由于缺乏激素受体和HER2,标准治疗依赖化疗,在总生存期(OS)和无复发生存期(RFS)方面的效果有限。TNBC的分子异质性使风险分层和个性化治疗方法变得复杂。在此背景下,无监督机器学习可以改善临床同质亚组的识别并促进预后预测。

目的

使用机器学习算法,特别是用于亚组识别的k-原型算法和用于结果预测的随机森林算法,为TNBC患者开发OS和RFS的预测模型。

方法

对2012年至2024年间诊断的4808例TNBC患者进行了一项回顾性队列研究。从匿名临床记录中分析临床、人口统计学和生物分子变量。应用k-原型算法根据共同特征将患者聚类分组。随后,使用随机森林的预测模型通过分层交叉验证以及AUC、敏感性和特异性等指标进行训练和评估。使用Cox回归确定与死亡率和复发相关的风险因素。

结果

确定了四个具有不同风险特征的聚类。总死亡率为28.8%,复发率为40.9%,中位随访时间为8.46年。风险最高的组死亡率为42.3%,复发率为54.2%,与较差的功能状态(ECOG≥3)和较高的BRCA1/2突变患病率(71%)相关。随机森林模型在死亡率预测中的准确率达到80%(AUC = 0.78),在复发预测中的准确率达到75%(AUC = 0.76)。Charlson合并症指数、ECOG、BRCA1/2状态和PD-L1表达等因素是结果预测的关键决定因素。

讨论

研究结果证实了机器学习在TNBC分层中的相关性。实现了具有临床意义的分类,优于仅基于临床或基因组变量的传统模型。合并症负担和肿瘤生物标志物在结果预测中起关键作用。尽管有其优势,但该研究存在局限性,包括其回顾性性质以及缺乏转录组数据。对这些模型进行前瞻性验证可以提高它们在临床实践中的适用性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验