合并乳腺癌数据集对分类性能具有协同效应，并提高特征稳定性。

Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability.

作者信息

van Vliet Martin H, Reyal Fabien, Horlings Hugo M, van de Vijver Marc J, Reinders Marcel J T, Wessels Lodewyk F A

机构信息

Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands.

出版信息

BMC Genomics. 2008 Aug 6;9:375. doi: 10.1186/1471-2164-9-375.

DOI:10.1186/1471-2164-9-375

PMID:18684329

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2527336/

Abstract

BACKGROUND

Michiels et al. (Lancet 2005; 365: 488-92) employed a resampling strategy to show that the genes identified as predictors of prognosis from resamplings of a single gene expression dataset are highly variable. The genes most frequently identified in the separate resamplings were put forward as a 'gold standard'. On a higher level, breast cancer datasets collected by different institutions can be considered as resamplings from the underlying breast cancer population. The limited overlap between published prognostic signatures confirms the trend of signature instability identified by the resampling strategy. Six breast cancer datasets, totaling 947 samples, all measured on the Affymetrix platform, are currently available. This provides a unique opportunity to employ a substantial dataset to investigate the effects of pooling datasets on classifier accuracy, signature stability and enrichment of functional categories.

RESULTS

We show that the resampling strategy produces a suboptimal ranking of genes, which can not be considered to be a 'gold standard'. When pooling breast cancer datasets, we observed a synergetic effect on the classification performance in 73% of the cases. We also observe a significant positive correlation between the number of datasets that is pooled, the validation performance, the number of genes selected, and the enrichment of specific functional categories. In addition, we have evaluated the support for five explanations that have been postulated for the limited overlap of signatures.

CONCLUSION

The limited overlap of current signature genes can be attributed to small sample size. Pooling datasets results in more accurate classification and a convergence of signature genes. We therefore advocate the analysis of new data within the context of a compendium, rather than analysis in isolation.

摘要

背景

米歇尔斯等人（《柳叶刀》，2005年；365卷：488 - 492页）采用重采样策略表明，从单个基因表达数据集中的重采样所鉴定出的作为预后预测指标的基因具有高度变异性。在各个重采样中最常鉴定出的基因被提出作为“金标准”。从更高层面来看，不同机构收集的乳腺癌数据集可被视为来自潜在乳腺癌总体的重采样。已发表的预后特征之间有限的重叠证实了重采样策略所识别出的特征不稳定性趋势。目前有六个乳腺癌数据集，共947个样本，均在Affymetrix平台上进行测量。这提供了一个独特的机会，可利用大量数据集来研究合并数据集对分类器准确性、特征稳定性及功能类别富集的影响。

结果

我们表明，重采样策略产生的基因排名次优，不能被视为“金标准”。在合并乳腺癌数据集时，我们在73%的案例中观察到对分类性能有协同效应。我们还观察到合并的数据集数量、验证性能、所选基因数量以及特定功能类别的富集之间存在显著正相关。此外，我们评估了对为特征有限重叠所假定的五种解释的支持情况。

结论

当前特征基因的有限重叠可归因于样本量小。合并数据集可实现更准确的分类以及特征基因的趋同。因此，我们提倡在综合数据集的背景下分析新数据，而非孤立地进行分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b02/2527336/8f0579a39b3c/1471-2164-9-375-1.jpg

相似文献

Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability.

BMC Genomics. 2008 Aug 6;9:375. doi: 10.1186/1471-2164-9-375.

A comprehensive analysis of prognostic signatures reveals the high predictive capacity of the proliferation, immune response and RNA splicing modules in breast cancer.

Breast Cancer Res. 2008;10(6):R93. doi: 10.1186/bcr2192. Epub 2008 Nov 13.

Genes and functions from breast cancer signatures.

BMC Cancer. 2018 Apr 27;18(1):473. doi: 10.1186/s12885-018-4388-4.

Evaluation of public cancer datasets and signatures identifies TP53 mutant signatures with robust prognostic and predictive value.

BMC Cancer. 2015 Mar 26;15:179. doi: 10.1186/s12885-015-1102-7.

A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability.

BMC Bioinformatics. 2009 Nov 26;10:389. doi: 10.1186/1471-2105-10-389.

Significant random signatures reveals new biomarker for breast cancer.

BMC Med Genomics. 2019 Nov 8;12(1):160. doi: 10.1186/s12920-019-0609-1.

A resampling-based meta-analysis for detection of differential gene expression in breast cancer.

BMC Cancer. 2008 Dec 30;8:396. doi: 10.1186/1471-2407-8-396.

Module-based outcome prediction using breast cancer compendia.

PLoS One. 2007 Oct 17;2(10):e1047. doi: 10.1371/journal.pone.0001047.

Effects of sample size on robustness and prediction accuracy of a prognostic gene signature.

BMC Bioinformatics. 2009 May 16;10:147. doi: 10.1186/1471-2105-10-147.

Mixture classification model based on clinical markers for breast cancer prognosis.

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

引用本文的文献

Deep contrastive learning for predicting cancer prognosis using gene expression values.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae544.

A PET-Surrogate Signature for the Interrogation of the Metabolic Status of Breast Cancers.

Adv Sci (Weinh). 2024 Jul;11(28):e2308255. doi: 10.1002/advs.202308255. Epub 2024 May 17.

A pairwise strategy for imputing predictive features when combining multiple datasets.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac839.

scHumanNet: a single-cell network analysis platform for the study of cell-type specificity of disease genes.

Nucleic Acids Res. 2023 Jan 25;51(2):e8. doi: 10.1093/nar/gkac1042.

Decision Theory versus Conventional Statistics for Personalized Therapy of Breast Cancer.

J Pers Med. 2022 Apr 2;12(4):570. doi: 10.3390/jpm12040570.

Robust edge-based biomarker discovery improves prediction of breast cancer metastasis.

BMC Bioinformatics. 2020 Sep 30;21(Suppl 14):359. doi: 10.1186/s12859-020-03692-2.

Heterogeneous multiple kernel learning for breast cancer outcome evaluation.

BMC Bioinformatics. 2020 Apr 23;21(1):155. doi: 10.1186/s12859-020-3483-0.

Comparative evaluation of network features for the prediction of breast cancer metastasis.

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):40. doi: 10.1186/s12920-020-0676-3.

A Pathway-Based Strategy to Identify Biomarkers for Lung Cancer Diagnosis and Prognosis.

Evol Bioinform Online. 2019 Mar 21;15:1176934319838494. doi: 10.1177/1176934319838494. eCollection 2019.

Autophagic reliance promotes metabolic reprogramming in oncogenic KRAS-driven tumorigenesis.

Autophagy. 2018;14(9):1481-1498. doi: 10.1080/15548627.2018.1450708. Epub 2018 Aug 21.

本文引用的文献

Merging microarray data from separate breast cancer studies provides a robust prognostic test.

BMC Bioinformatics. 2008 Feb 27;9:125. doi: 10.1186/1471-2105-9-125.

Chemokine signaling controls intracortical migration and final distribution of GABAergic interneurons.

J Neurosci. 2008 Feb 13;28(7):1613-24. doi: 10.1523/JNEUROSCI.4651-07.2008.

International Web-based consultation on priorities for translational breast cancer research.

Breast Cancer Res. 2007;9(6):R81. doi: 10.1186/bcr1798.

Activation of KIF4A as a prognostic biomarker and therapeutic target for lung cancer.

Clin Cancer Res. 2007 Nov 15;13(22 Pt 1):6624-31. doi: 10.1158/1078-0432.CCR-07-1328.

Involvement of kinesin family member 2C/mitotic centromere-associated kinesin overexpression in mammary carcinogenesis.

Cancer Sci. 2008 Jan;99(1):62-70. doi: 10.1111/j.1349-7006.2007.00635.x. Epub 2007 Oct 18.

Network-based classification of breast cancer metastasis.

Mol Syst Biol. 2007;3:140. doi: 10.1038/msb4100180. Epub 2007 Oct 16.

Network modeling links breast cancer susceptibility and centrosome dysfunction.

Nat Genet. 2007 Nov;39(11):1338-49. doi: 10.1038/ng.2007.2. Epub 2007 Oct 7.

Gene expression profiling and histopathological characterization of triple-negative/basal-like breast carcinomas.

Breast Cancer Res. 2007;9(5):R65. doi: 10.1186/bcr1771.

Capturing heterogeneity in gene expression studies by surrogate variable analysis.

PLoS Genet. 2007 Sep;3(9):1724-35. doi: 10.1371/journal.pgen.0030161. Epub 2007 Aug 1.

Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer.

BMC Cancer. 2007 Sep 25;7:182. doi: 10.1186/1471-2407-7-182.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

合并乳腺癌数据集对分类性能具有协同效应，并提高特征稳定性。

Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献