• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

采用K均值降维的机器学习方法预测乳腺癌患者的生存结局

Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer.

作者信息

Zhao Melissa, Tang Yushi, Kim Hyunkyung, Hasegawa Kohei

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

出版信息

Cancer Inform. 2018 Nov 9;17:1176935118810215. doi: 10.1177/1176935118810215. eCollection 2018.

DOI:10.1177/1176935118810215
PMID:30455569
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6238199/
Abstract

OBJECTIVE

Despite existing prognostic markers, breast cancer prognosis remains a difficult subject due to the complex relationships between many contributing factors and survival. This study seeks to integrate multiple clinicopathological and genomic factors with dimensional reduction across machine learning algorithms to compare survival predictions.

METHODS

This is a secondary analysis of the data from a prospective cohort study of female patients with breast cancer enrolled in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). We constructed a series of predictive models: ensemble models (Gradient Boosting and Random Forest), support vector machine (SVM), and artificial neural networks (ANN) for 5-year survival based on clinicopathological and gene expression data after K-means clustering with K-nearest-neighbor (KNN) classification. Model performance was evaluated by receiver operating characteristic (ROC) curve, accuracy, and calibration slope (CS). Model stability was assessed over 10 random runs in terms of ROC, accuracy, CS, and variable importance.

RESULTS

The analytic cohort is composed of 1874 patients with breast cancer. Overall, the median age was 62 years; the 5-year survival rate was 75%. ROC and accuracy were not significantly different between models (ROC and accuracy around 0.67 and 0.72 across models, respectively). However, ensemble methods resulted in better fit (CS) with stable measures of variable importance across 10 random training/validation splits. K-means clustering of gene expression profiles on training data points along with KNN classification of validation data points was a robust method of dimensional reduction. Furthermore, the gene expression cluster with the highest mortality risk was an influential factor in model prediction.

CONCLUSIONS

Using machine learning methods to construct predictive models for 5-year survival in patients with breast cancer, we demonstrated discrimination ability across models with new insight into the stability and utility of dimensional reduction on genomic features in breast cancer survival prediction.

摘要

目的

尽管存在现有的预后标志物,但由于许多促成因素与生存率之间的复杂关系,乳腺癌的预后仍然是一个难题。本研究旨在将多个临床病理和基因组因素与跨机器学习算法的降维方法相结合,以比较生存预测结果。

方法

这是对参加国际乳腺癌分子分类联盟(METABRIC)的女性乳腺癌患者前瞻性队列研究数据的二次分析。我们构建了一系列预测模型:基于临床病理和基因表达数据,在使用K近邻(KNN)分类进行K均值聚类后,建立用于预测5年生存率的集成模型(梯度提升和随机森林)、支持向量机(SVM)和人工神经网络(ANN)。通过受试者工作特征(ROC)曲线、准确性和校准斜率(CS)评估模型性能。在10次随机运行中,根据ROC、准确性、CS和变量重要性评估模型稳定性。

结果

分析队列由1874例乳腺癌患者组成。总体而言,中位年龄为62岁;5年生存率为75%。各模型之间的ROC和准确性无显著差异(各模型的ROC和准确性分别约为0.67和0.72)。然而,集成方法在10次随机训练/验证分割中具有更好的拟合度(CS)以及稳定的变量重要性度量。对训练数据点的基因表达谱进行K均值聚类以及对验证数据点进行KNN分类是一种稳健的降维方法。此外,具有最高死亡风险的基因表达簇是模型预测中的一个影响因素。

结论

使用机器学习方法构建乳腺癌患者5年生存预测模型,我们展示了各模型的判别能力,并对乳腺癌生存预测中基因组特征降维的稳定性和实用性有了新的认识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/9264da7a5f66/10.1177_1176935118810215-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/f752d787b3e0/10.1177_1176935118810215-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/1f0e7b36f882/10.1177_1176935118810215-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/acf732dab765/10.1177_1176935118810215-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/9c046e63a8ef/10.1177_1176935118810215-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/9264da7a5f66/10.1177_1176935118810215-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/f752d787b3e0/10.1177_1176935118810215-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/1f0e7b36f882/10.1177_1176935118810215-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/acf732dab765/10.1177_1176935118810215-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/9c046e63a8ef/10.1177_1176935118810215-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/daa3/6238199/9264da7a5f66/10.1177_1176935118810215-fig5.jpg

相似文献

1
Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer.采用K均值降维的机器学习方法预测乳腺癌患者的生存结局
Cancer Inform. 2018 Nov 9;17:1176935118810215. doi: 10.1177/1176935118810215. eCollection 2018.
2
Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer.比较监督式和半监督式机器学习模型在乳腺癌诊断中的应用
Ann Med Surg (Lond). 2021 Jan 8;62:53-64. doi: 10.1016/j.amsu.2020.12.043. eCollection 2021 Feb.
3
Machine learning models in breast cancer survival prediction.用于乳腺癌生存预测的机器学习模型。
Technol Health Care. 2016;24(1):31-42. doi: 10.3233/THC-151071.
4
Hubness weighted SVM ensemble for prediction of breast cancer subtypes.基于 Hubness 权重的支持向量机集成模型预测乳腺癌亚型。
Technol Health Care. 2022;30(3):565-578. doi: 10.3233/THC-212825.
5
The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study.机器学习模型在乳腺癌预后预测中的应用与比较:回顾性队列研究
JMIR Med Inform. 2022 Feb 18;10(2):e33440. doi: 10.2196/33440.
6
Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.我们是否需要不同的机器学习算法来进行定量构效关系建模?对 16 种机器学习算法在 14 个定量构效关系数据集上的综合评估。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa321.
7
Application of machine learning approaches for osteoporosis risk prediction in postmenopausal women.机器学习方法在绝经后妇女骨质疏松症风险预测中的应用。
Arch Osteoporos. 2020 Oct 23;15(1):169. doi: 10.1007/s11657-020-00802-8.
8
Predicting Characteristics Associated with Breast Cancer Survival Using Multiple Machine Learning Approaches.使用多种机器学习方法预测与乳腺癌生存相关的特征。
Comput Math Methods Med. 2022 Apr 25;2022:1249692. doi: 10.1155/2022/1249692. eCollection 2022.
9
Predicting factors for survival of breast cancer patients using machine learning techniques.运用机器学习技术预测乳腺癌患者的生存因素。
BMC Med Inform Decis Mak. 2019 Mar 22;19(1):48. doi: 10.1186/s12911-019-0801-4.
10
Predicting in-hospital mortality in ICU patients with sepsis using gradient boosting decision tree.使用梯度提升决策树预测重症监护病房脓毒症患者的院内死亡率。
Medicine (Baltimore). 2021 May 14;100(19):e25813. doi: 10.1097/MD.0000000000025813.

引用本文的文献

1
Artificial intelligence in breast cancer survival prediction: a comprehensive systematic review and meta-analysis.人工智能在乳腺癌生存预测中的应用:一项全面的系统评价和荟萃分析。
Front Oncol. 2025 Jan 7;14:1420328. doi: 10.3389/fonc.2024.1420328. eCollection 2024.
2
Application of machine learning in breast cancer survival prediction using a multimethod approach.机器学习在乳腺癌生存预测中的多方法应用。
Sci Rep. 2024 Dec 3;14(1):30147. doi: 10.1038/s41598-024-81734-y.
3
Machine Learning Model Construction and Testing: Anticipating Cancer Incidence and Mortality.

本文引用的文献

1
Breast cancer data analysis for survivability studies and prediction.乳腺癌数据分析用于生存研究和预测。
Comput Methods Programs Biomed. 2018 Mar;155:199-208. doi: 10.1016/j.cmpb.2017.12.011. Epub 2017 Dec 12.
2
70-Gene Signature as an Aid to Treatment Decisions in Early-Stage Breast Cancer.70 基因特征作为早期乳腺癌治疗决策的辅助手段。
N Engl J Med. 2016 Aug 25;375(8):717-29. doi: 10.1056/NEJMoa1602253.
3
The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes.2,433 例乳腺癌的体细胞突变图谱细化了其基因组和转录组景观。
机器学习模型构建与测试:预测癌症发病率和死亡率。
Diseases. 2024 Jun 30;12(7):139. doi: 10.3390/diseases12070139.
4
Polygenic Risk Score for Cardiovascular Diseases in Artificial Intelligence Paradigm: A Review.人工智能范式中心血管疾病的多基因风险评分:综述。
J Korean Med Sci. 2023 Nov 27;38(46):e395. doi: 10.3346/jkms.2023.38.e395.
5
Multiomic Investigations into Lung Health and Disease.肺部健康与疾病的多组学研究
Microorganisms. 2023 Aug 19;11(8):2116. doi: 10.3390/microorganisms11082116.
6
Development of a Machine Learning Model to Predict Recurrence of Oral Tongue Squamous Cell Carcinoma.用于预测口腔舌鳞状细胞癌复发的机器学习模型的开发。
Cancers (Basel). 2023 May 16;15(10):2769. doi: 10.3390/cancers15102769.
7
Joint learning sample similarity and correlation representation for cancer survival prediction.联合学习样本相似性和相关性表示在癌症生存预测中的应用。
BMC Bioinformatics. 2022 Dec 19;23(1):553. doi: 10.1186/s12859-022-05110-1.
8
Biomedical Application of Identified Biomarkers Gene Expression Based Early Diagnosis and Detection in Cervical Cancer with Modified Probabilistic Neural Network.基于改进概率神经网络的识别生物标志物基因表达在宫颈癌早期诊断和检测中的生物医学应用。
Contrast Media Mol Imaging. 2022 Sep 10;2022:4946154. doi: 10.1155/2022/4946154. eCollection 2022.
9
Deep Learning Mechanism for Predicting the Axillary Lymph Node Metastasis in Patients with Primary Breast Cancer.深度学习在原发性乳腺癌患者腋窝淋巴结转移预测中的作用
Biomed Res Int. 2022 Aug 10;2022:8616535. doi: 10.1155/2022/8616535. eCollection 2022.
10
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis.整合分子、影像和临床数据分析以预测癌症预后。
Cancers (Basel). 2022 Jun 30;14(13):3215. doi: 10.3390/cancers14133215.
Nat Commun. 2016 May 10;7:11479. doi: 10.1038/ncomms11479.
4
Prospective Validation of a 21-Gene Expression Assay in Breast Cancer.21基因表达检测法在乳腺癌中的前瞻性验证
N Engl J Med. 2015 Nov 19;373(21):2005-14. doi: 10.1056/NEJMoa1510764. Epub 2015 Sep 27.
5
Machine learning models in breast cancer survival prediction.用于乳腺癌生存预测的机器学习模型。
Technol Health Care. 2016;24(1):31-42. doi: 10.3233/THC-151071.
6
Machine learning applications in cancer prognosis and prediction.机器学习在癌症预后和预测中的应用。
Comput Struct Biotechnol J. 2014 Nov 15;13:8-17. doi: 10.1016/j.csbj.2014.11.005. eCollection 2015.
7
Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012.全球癌症发病与死亡:GLOBOCAN 2012 数据源、方法与主要模式。
Int J Cancer. 2015 Mar 1;136(5):E359-86. doi: 10.1002/ijc.29210. Epub 2014 Oct 9.
8
Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal.利用 cBioPortal 进行复杂癌症基因组学和临床特征的综合分析
Sci Signal. 2013 Apr 2;6(269):pl1. doi: 10.1126/scisignal.2004088.
9
CLCA2, a target of the p53 family, negatively regulates cancer cell migration and invasion.CLCA2 是 p53 家族的一个靶点,负向调节癌细胞的迁移和侵袭。
Cancer Biol Ther. 2012 Dec;13(14):1512-21. doi: 10.4161/cbt.22280. Epub 2012 Sep 18.
10
The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data.cBio 癌症基因组学门户:一个用于探索多维癌症基因组学数据的开放平台。
Cancer Discov. 2012 May;2(5):401-4. doi: 10.1158/2159-8290.CD-12-0095.