基于微阵列数据的乳腺癌预后生存模型比较研究：单个基因能胜过所有模型吗？

A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all?

作者信息

Haibe-Kains B, Desmedt C, Sotiriou C, Bontempi G

机构信息

Machine Learning Group, Department of Computer Science, Institut Jules Bordet, Université Libre de Bruxelles, Brussels, Belgium.

出版信息

Bioinformatics. 2008 Oct 1;24(19):2200-8. doi: 10.1093/bioinformatics/btn374. Epub 2008 Jul 17.

DOI:10.1093/bioinformatics/btn374

PMID:18635567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2553442/

Abstract

MOTIVATION

Survival prediction of breast cancer (BC) patients independently of treatment, also known as prognostication, is a complex task since clinically similar breast tumors, in addition to be molecularly heterogeneous, may exhibit different clinical outcomes. In recent years, the analysis of gene expression profiles by means of sophisticated data mining tools emerged as a promising technology to bring additional insights into BC biology and to improve the quality of prognostication. The aim of this work is to assess quantitatively the accuracy of prediction obtained with state-of-the-art data analysis techniques for BC microarray data through an independent and thorough framework.

RESULTS

Due to the large number of variables, the reduced amount of samples and the high degree of noise, complex prediction methods are highly exposed to performance degradation despite the use of cross-validation techniques. Our analysis shows that the most complex methods are not significantly better than the simplest one, a univariate model relying on a single proliferation gene. This result suggests that proliferation might be the most relevant biological process for BC prognostication and that the loss of interpretability deriving from the use of overcomplex methods may be not sufficiently counterbalanced by an improvement of the quality of prediction.

AVAILABILITY

The comparison study is implemented in an R package called survcomp and is available from http://www.ulb.ac.be/di/map/bhaibeka/software/survcomp/.

摘要

动机

独立于治疗手段对乳腺癌（BC）患者进行生存预测，即预后判断，是一项复杂的任务，因为临床上相似的乳腺肿瘤除了分子层面具有异质性外，还可能表现出不同的临床结果。近年来，借助先进的数据挖掘工具分析基因表达谱，成为一种很有前景的技术，可为乳腺癌生物学带来更多见解，并提高预后判断的质量。这项工作的目的是通过一个独立且全面的框架，定量评估使用先进数据分析技术对乳腺癌微阵列数据进行预测的准确性。

结果

由于变量数量众多、样本量减少以及噪声程度高，尽管使用了交叉验证技术，复杂的预测方法仍极易出现性能下降的情况。我们的分析表明，最复杂的方法并不比最简单的方法（即依赖单个增殖基因的单变量模型）有显著优势。这一结果表明，增殖可能是乳腺癌预后判断中最相关的生物学过程，而且使用过于复杂的方法导致的可解释性丧失，可能无法通过预测质量的提高得到充分弥补。

可用性

比较研究在一个名为survcomp的R包中实现，可从http://www.ulb.ac.be/di/map/bhaibeka/software/survcomp/获取。

相似文献

A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all?基于微阵列数据的乳腺癌预后生存模型比较研究：单个基因能胜过所有模型吗？

Bioinformatics. 2008 Oct 1;24(19):2200-8. doi: 10.1093/bioinformatics/btn374. Epub 2008 Jul 17.

Mixture classification model based on clinical markers for breast cancer prognosis.基于临床标志物的乳腺癌预后混合分类模型。

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer.Genefu：一个用于计算基于基因表达的乳腺癌特征的R/Bioconductor软件包。

Bioinformatics. 2016 Apr 1;32(7):1097-9. doi: 10.1093/bioinformatics/btv693. Epub 2015 Nov 24.

Cross-study validation for the assessment of prediction algorithms.交叉研究验证预测算法的评估。

Bioinformatics. 2014 Jun 15;30(12):i105-12. doi: 10.1093/bioinformatics/btu279.

Gene-set activity toolbox (GAT): A platform for microarray-based cancer diagnosis using an integrative gene-set analysis approach.基因集活性工具箱（GAT）：一种使用综合基因集分析方法进行基于微阵列的癌症诊断的平台。

J Bioinform Comput Biol. 2016 Aug;14(4):1650015. doi: 10.1142/S0219720016500153. Epub 2016 Mar 15.

survcomp: an R/Bioconductor package for performance assessment and comparison of survival models.survcomp：一个用于评估和比较生存模型性能的 R/Bioconductor 包。

Bioinformatics. 2011 Nov 15;27(22):3206-8. doi: 10.1093/bioinformatics/btr511. Epub 2011 Sep 7.

Integrating biological knowledge with gene expression profiles for survival prediction of cancer.整合生物学知识与基因表达谱以预测癌症患者的生存情况。

J Comput Biol. 2009 Feb;16(2):265-78. doi: 10.1089/cmb.2008.12TT.

Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays.在表达谱分析中交互式优化信噪比：Affymetrix微阵列中特定项目的算法选择和检测p值加权

Bioinformatics. 2004 Nov 1;20(16):2534-44. doi: 10.1093/bioinformatics/bth280. Epub 2004 Apr 29.

NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.NCC-AUC：一种 AUC 优化方法，用于从基因组和临床数据中识别用于癌症预后的多生物标志物组。

Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.

Gene expression profiling of breast tumor cell lines to predict for therapeutic response to microtubule-stabilizing agents.对乳腺肿瘤细胞系进行基因表达谱分析，以预测对微管稳定剂的治疗反应。

Breast Cancer Res Treat. 2012 Apr;132(3):1035-47. doi: 10.1007/s10549-011-1687-8. Epub 2011 Jul 27.

引用本文的文献

Development of a predictive model for long-term care and support eligibility certification associated with dementia-related functional impairments: the IRIDE cohort study.与痴呆相关功能障碍相关的长期护理和支持资格认证预测模型的开发：IRIDE队列研究。

BMC Public Health. 2025 Aug 21;25(1):2867. doi: 10.1186/s12889-025-23681-5.

Deep learning informed multimodal fusion of radiology and pathology to predict outcomes in HPV-associated oropharyngeal squamous cell carcinoma.深度学习助力放射学与病理学的多模态融合，以预测人乳头瘤病毒相关口咽鳞状细胞癌的预后。

EBioMedicine. 2025 Apr;114:105663. doi: 10.1016/j.ebiom.2025.105663. Epub 2025 Mar 22.

Advantage of Log Odds of Metastatic Lymph Nodes After Curative-Intent Resection of Gallbladder Cancer.胆囊癌根治性切除术后转移性淋巴结对数优势比的优势

Ann Surg Oncol. 2025 Mar;32(3):1742-1751. doi: 10.1245/s10434-024-16492-2. Epub 2024 Nov 14.

Deep learning survival model predicts outcome after intracerebral hemorrhage from initial CT scan.深度学习生存模型可根据脑出血初始CT扫描结果预测预后。

Eur Stroke J. 2025 Mar;10(1):225-235. doi: 10.1177/23969873241260154. Epub 2024 Jun 16.

Integrative bioinformatics approach yields a novel gene expression risk model for prognosis and progression prediction in prostate cancer.综合生物信息学方法为前列腺癌的预后和进展预测提供了一种新的基因表达风险模型。

J Cell Mol Med. 2024 Jun;28(11):e18405. doi: 10.1111/jcmm.18405.

Metabolic pathway-based subtypes associate glycan biosynthesis and treatment response in head and neck cancer.基于代谢途径的亚型与头颈部癌症中的聚糖生物合成及治疗反应相关。

NPJ Precis Oncol. 2024 May 23;8(1):116. doi: 10.1038/s41698-024-00602-0.

Proteome-wide Mendelian randomization identifies causal plasma proteins in lung cancer.全蛋白质组孟德尔随机化确定肺癌中的因果血浆蛋白。

iScience. 2024 Jan 20;27(2):108985. doi: 10.1016/j.isci.2024.108985. eCollection 2024 Feb 16.

Added prognostic value of 3D deep learning-derived features from preoperative MRI for adult-type diffuse gliomas.术前 MRI 的 3D 深度学习特征对成人弥漫性胶质瘤的预后价值增加。

Neuro Oncol. 2024 Mar 4;26(3):571-580. doi: 10.1093/neuonc/noad202.

Multitask Learning with Convolutional Neural Networks and Vision Transformers Can Improve Outcome Prediction for Head and Neck Cancer Patients.结合卷积神经网络和视觉Transformer的多任务学习可改善头颈癌患者的预后预测。

Cancers (Basel). 2023 Oct 9;15(19):4897. doi: 10.3390/cancers15194897.

Algorithmically Reconstructed Molecular Pathways as the New Generation of Prognostic Molecular Biomarkers in Human Solid Cancers.算法重建的分子通路作为人类实体癌新一代的预后分子生物标志物

Proteomes. 2023 Aug 25;11(3):26. doi: 10.3390/proteomes11030026.

本文引用的文献

Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes.与乳腺癌临床结果相关的生物学过程取决于分子亚型。

Clin Cancer Res. 2008 Aug 15;14(16):5158-65. doi: 10.1158/1078-0432.CCR-07-4756.

Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer.预测淋巴结阴性原发性乳腺癌转移的基因特征的通路分析。

BMC Cancer. 2007 Sep 25;7:182. doi: 10.1186/1471-2407-7-182.

Comparison of gene sets for expression profiling: prediction of metastasis from low-malignant breast cancer.用于表达谱分析的基因集比较：低恶性乳腺癌转移的预测

Clin Cancer Res. 2007 Sep 15;13(18 Pt 1):5355-60. doi: 10.1158/1078-0432.CCR-07-0249.

Taking gene-expression profiling to the clinic: when will molecular signatures become relevant to patient care?将基因表达谱分析应用于临床：分子特征何时会与患者护理相关？

Nat Rev Cancer. 2007 Jul;7(7):545-53. doi: 10.1038/nrc2173.

Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series.在TRANSBIG多中心独立验证系列中，76基因预后特征对淋巴结阴性乳腺癌患者具有强烈的时间依赖性。

Clin Cancer Res. 2007 Jun 1;13(11):3207-14. doi: 10.1158/1078-0432.CCR-06-2765.

Assessment of survival prediction models based on microarray data.基于微阵列数据的生存预测模型评估。

Bioinformatics. 2007 Jul 15;23(14):1768-74. doi: 10.1093/bioinformatics/btm232. Epub 2007 May 7.

A blocking strategy to improve gene selection for classification of gene expression data.一种用于改进基因选择以对基因表达数据进行分类的阻断策略。

IEEE/ACM Trans Comput Biol Bioinform. 2007 Apr-Jun;4(2):293-300. doi: 10.1109/TCBB.2007.1014.

Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade.通过基因组分级定义雌激素受体阳性乳腺癌中临床上不同的分子亚型。

J Clin Oncol. 2007 Apr 1;25(10):1239-46. doi: 10.1200/JCO.2006.07.1522.

Consistent estimation of the expected Brier score in general survival models with right-censored event times.在具有右删失事件时间的一般生存模型中对预期Brier评分进行一致估计。

Biom J. 2006 Dec;48(6):1029-40. doi: 10.1002/bimj.200610301.

Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer.70基因预后特征对淋巴结阴性乳腺癌女性患者的验证及临床应用价值

J Natl Cancer Inst. 2006 Sep 6;98(17):1183-92. doi: 10.1093/jnci/djj329.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验