能否通过合并基因表达数据集来提高生存预测？

Can survival prediction be improved by merging gene expression data sets?

机构信息

Swiss Institute for Experimental Cancer Research (ISREC), Swiss Federal Institute of Technology (EPFL), School of Life Sciences, EPFL SV ISREC, Lausanne, Switzerland.

出版信息

PLoS One. 2009 Oct 23;4(10):e7431. doi: 10.1371/journal.pone.0007431.

DOI:10.1371/journal.pone.0007431

PMID:19851466

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2761544/

Abstract

BACKGROUND

High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets.

RESULTS

Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1.

CONCLUSIONS

Merging did not deteriorate performance on average despite (a) The diversity of microarray platforms used. (b) The heterogeneity of patients cohorts. (c) The heterogeneity of breast cancer disease. (d) Substantial variation of time to death or relapse. (e) The reduced number of genes in the merged data sets. Predictors derived from the merged data sets were more robust, consistent and reproducible across microarray platforms. Moreover, merging data sets from different studies helps to better understand the biases of individual studies and can lead to the identification of strong survival factors like CYB5D1 expression.

摘要

背景

高通量基因表达谱技术产生了大量的数据，越来越多地用于临床试验中的肿瘤活检的特征描述。通过将机器学习算法应用于这些有临床记录的数据集中，人们希望能够改善肿瘤的诊断、预后以及治疗反应的预测。然而，由于过度拟合，单个试验研究中纳入的患者数量有限限制了机器学习方法的能力。通过合并来自不同研究的数据，可以部分克服这一限制。然而，这些数据集在技术偏差、患者选择标准和随访治疗方面存在差异。因此，增加样本量的优势是否超过合并数据集异质性增加的劣势还远不清楚。在这里，我们专门针对乳腺癌数据集进行了一项系统研究来回答这个问题。我们使用基于 Cox 回归的生存预测作为衡量合并数据集增加价值的检测方法。

结果

使用时间依赖性接收器工作特性曲线下面积（ROC-AUC）和危险比作为性能指标，我们发现与单个数据集相比，合并数据集的生存预测没有明显的改善或恶化。这显然是因为一些具有很强预后能力的基因在所有微阵列平台上都不可用，因此在合并的数据集中没有保留。令人惊讶的是，我们发现由 CYB5D1 组成的单个基因预测器的总体性能最佳。

结论

尽管存在以下因素，合并数据集平均上没有降低性能：（a）使用的微阵列平台的多样性；（b）患者队列的异质性；（c）乳腺癌疾病的异质性；（d）死亡或复发的时间变化很大；（e）合并数据集中的基因数量减少。从合并数据集中得出的预测器在微阵列平台之间更稳健、一致且可重复。此外，合并来自不同研究的数据有助于更好地理解单个研究的偏差，并可以识别像 CYB5D1 表达这样的强生存因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/16f3/2761544/ff14e74d397b/pone.0007431.g001.jpg

相似文献

Can survival prediction be improved by merging gene expression data sets?能否通过合并基因表达数据集来提高生存预测？

PLoS One. 2009 Oct 23;4(10):e7431. doi: 10.1371/journal.pone.0007431.

Comparative study of joint analysis of microarray gene expression data in survival prediction and risk assessment of breast cancer patients.微阵列基因表达数据联合分析在乳腺癌患者生存预测和风险评估中的比较研究。

Brief Bioinform. 2016 Sep;17(5):771-85. doi: 10.1093/bib/bbv092. Epub 2015 Oct 26.

A Novel 18-Marker Panel Predicting Clinical Outcome in Breast Cancer.一种预测乳腺癌临床结局的新型 18 标志物面板。

Cancer Epidemiol Biomarkers Prev. 2017 Nov;26(11):1619-1628. doi: 10.1158/1055-9965.EPI-17-0606. Epub 2017 Sep 6.

Assessment of evaluation criteria for survival prediction from genomic data.基于基因组数据的生存预测评估标准的评估

Biom J. 2011 Mar;53(2):202-16. doi: 10.1002/bimj.201000048. Epub 2011 Feb 10.

NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data.NCC-AUC：一种 AUC 优化方法，用于从基因组和临床数据中识别用于癌症预后的多生物标志物组。

Bioinformatics. 2015 Oct 15;31(20):3330-8. doi: 10.1093/bioinformatics/btv374. Epub 2015 Jun 18.

Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.基于基因集富集分析的 clear cell 肾细胞癌基因表达分析用于生物统计学管理。

BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.

Assessment of performance of survival prediction models for cancer prognosis.癌症预后生存预测模型性能评估。

BMC Med Res Methodol. 2012 Jul 23;12:102. doi: 10.1186/1471-2288-12-102.

Module-based outcome prediction using breast cancer compendia.使用乳腺癌综合数据集进行基于模块的结果预测。

PLoS One. 2007 Oct 17;2(10):e1047. doi: 10.1371/journal.pone.0001047.

Integrating biological knowledge with gene expression profiles for survival prediction of cancer.整合生物学知识与基因表达谱以预测癌症患者的生存情况。

J Comput Biol. 2009 Feb;16(2):265-78. doi: 10.1089/cmb.2008.12TT.

A simple but highly effective approach to evaluate the prognostic performance of gene expression signatures.一种简单而高效的方法来评估基因表达特征的预后性能。

PLoS One. 2011;6(12):e28320. doi: 10.1371/journal.pone.0028320. Epub 2011 Dec 7.

引用本文的文献

Synaptic pruning genes networks in Alzheimer's disease: correlations with neuropathology and cognitive decline.阿尔茨海默病中的突触修剪基因网络：与神经病理学和认知衰退的相关性

Geroscience. 2025 Jun 14. doi: 10.1007/s11357-025-01740-4.

Exploring SERPINA3 as a neuroinflammatory modulator in Alzheimer's disease with sex and regional brain variations.探索丝氨酸蛋白酶抑制剂A3（SERPINA3）作为阿尔茨海默病中具有性别和脑区差异的神经炎症调节因子。

Metab Brain Dis. 2025 Jan 4;40(1):83. doi: 10.1007/s11011-024-01523-4.

Strategies for improving the performance of prediction models for response to immune checkpoint blockade therapy in cancer.提高癌症免疫检查点阻断治疗反应预测模型性能的策略。

BMC Res Notes. 2024 Apr 9;17(1):102. doi: 10.1186/s13104-024-06760-5.

Skeletal muscle of young females under resistance exercise exhibits a unique innate immune cell infiltration profile compared to males and elderly individuals.与男性和老年人相比，年轻女性在进行抗阻运动时，其骨骼肌表现出独特的固有免疫细胞浸润特征。

J Muscle Res Cell Motil. 2024 Dec;45(4):171-190. doi: 10.1007/s10974-024-09668-6. Epub 2024 Apr 5.

Resistin-like beta reduction is associated to low survival rate and is downregulated by adjuvant therapy in colorectal cancer patients.抵抗素样β 降低与低生存率相关，并通过辅助治疗在结直肠癌患者中下调。

Sci Rep. 2023 Jan 27;13(1):1490. doi: 10.1038/s41598-023-28450-1.

A pairwise strategy for imputing predictive features when combining multiple datasets.当组合多个数据集时，用于推断预测特征的成对策略。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac839.

GJA1/CX43 High Expression Levels in the Cervical Spinal Cord of ALS Patients Correlate to Microglia-Mediated Neuroinflammatory Profile.GJA1/CX43在肌萎缩侧索硬化症患者颈脊髓中的高表达水平与小胶质细胞介导的神经炎症特征相关。

Biomedicines. 2022 Sep 10;10(9):2246. doi: 10.3390/biomedicines10092246.

A sex-stratified analysis of neuroimmune gene expression signatures in Alzheimer's disease brains.对阿尔茨海默病大脑中神经免疫基因表达特征的性别分层分析。

Geroscience. 2023 Feb;45(1):523-541. doi: 10.1007/s11357-022-00664-7. Epub 2022 Sep 22.

Circulating monocytes associated with anti-PD-1 resistance in human biliary cancer induce T cell paralysis.循环单核细胞与人类胆道癌抗 PD-1 耐药相关，可诱导 T 细胞衰竭。

Cell Rep. 2022 Sep 20;40(12):111384. doi: 10.1016/j.celrep.2022.111384.

Lactate Induces the Expressions of MCT1 and HCAR1 to Promote Tumor Growth and Progression in Glioblastoma.乳酸诱导单羧酸转运蛋白1（MCT1）和羟基羧酸受体1（HCAR1）的表达以促进胶质母细胞瘤的肿瘤生长和进展。

Front Oncol. 2022 Apr 28;12:871798. doi: 10.3389/fonc.2022.871798. eCollection 2022.

本文引用的文献

Prognostic gene signatures for non-small-cell lung cancer.非小细胞肺癌的预后基因特征

Proc Natl Acad Sci U S A. 2009 Feb 24;106(8):2824-8. doi: 10.1073/pnas.0809444106. Epub 2009 Feb 5.

PGRMC1: a new biomarker for the estrogen receptor in breast cancer.PGRMC1：一种用于乳腺癌雌激素受体的新型生物标志物。

Breast Cancer Res. 2008;10(6):113. doi: 10.1186/bcr2191. Epub 2008 Nov 24.

A comprehensive analysis of prognostic signatures reveals the high predictive capacity of the proliferation, immune response and RNA splicing modules in breast cancer.对预后特征的综合分析揭示了增殖、免疫反应和RNA剪接模块在乳腺癌中的高预测能力。

Breast Cancer Res. 2008;10(6):R93. doi: 10.1186/bcr2192. Epub 2008 Nov 13.

Meta-analysis of microarray studies reveals a novel hematopoietic progenitor cell signature and demonstrates feasibility of inter-platform data integration.微阵列研究的荟萃分析揭示了一种新的造血祖细胞特征，并证明了跨平台数据整合的可行性。

PLoS One. 2008 Aug 13;3(8):e2965. doi: 10.1371/journal.pone.0002965.

Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability.合并乳腺癌数据集对分类性能具有协同效应，并提高特征稳定性。

BMC Genomics. 2008 Aug 6;9:375. doi: 10.1186/1471-2164-9-375.

Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures.乳腺癌基因表达谱的荟萃分析：旨在对乳腺癌亚型和预后特征达成统一认识。

Breast Cancer Res. 2008;10(4):R65. doi: 10.1186/bcr2124. Epub 2008 Jul 28.

A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all?基于微阵列数据的乳腺癌预后生存模型比较研究：单个基因能胜过所有模型吗？

Bioinformatics. 2008 Oct 1;24(19):2200-8. doi: 10.1093/bioinformatics/btn374. Epub 2008 Jul 17.

An integrated cross-platform prognosis study on neuroblastoma patients.一项关于神经母细胞瘤患者的综合跨平台预后研究。

Genomics. 2008 Oct;92(4):195-203. doi: 10.1016/j.ygeno.2008.05.014. Epub 2008 Jul 30.

Gene expression signatures, clinicopathological features, and individualized therapy in breast cancer.乳腺癌中的基因表达特征、临床病理特征及个体化治疗

JAMA. 2008 Apr 2;299(13):1574-87. doi: 10.1001/jama.299.13.1574.

Merging microarray data from separate breast cancer studies provides a robust prognostic test.合并来自不同乳腺癌研究的微阵列数据可提供一种可靠的预后测试。

BMC Bioinformatics. 2008 Feb 27;9:125. doi: 10.1186/1471-2105-9-125.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

能否通过合并基因表达数据集来提高生存预测？

Can survival prediction be improved by merging gene expression data sets?

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献