使用乳腺癌综合数据集进行基于模块的结果预测。

Module-based outcome prediction using breast cancer compendia.

作者信息

van Vliet Martin H, Klijn Christiaan N, Wessels Lodewyk F A, Reinders Marcel J T

机构信息

Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands.

出版信息

PLoS One. 2007 Oct 17;2(10):e1047. doi: 10.1371/journal.pone.0001047.

DOI:10.1371/journal.pone.0001047

PMID:17940611

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2002511/

Abstract

BACKGROUND

The availability of large collections of microarray datasets (compendia), or knowledge about grouping of genes into pathways (gene sets), is typically not exploited when training predictors of disease outcome. These can be useful since a compendium increases the number of samples, while gene sets reduce the size of the feature space. This should be favorable from a machine learning perspective and result in more robust predictors.

METHODOLOGY

We extracted modules of regulated genes from gene sets, and compendia. Through supervised analysis, we constructed predictors which employ modules predictive of breast cancer outcome. To validate these predictors we applied them to independent data, from the same institution (intra-dataset), and other institutions (inter-dataset).

CONCLUSIONS

We show that modules derived from single breast cancer datasets achieve better performance on the validation data compared to gene-based predictors. We also show that there is a trend in compendium specificity and predictive performance: modules derived from a single breast cancer dataset, and a breast cancer specific compendium perform better compared to those derived from a human cancer compendium. Additionally, the module-based predictor provides a much richer insight into the underlying biology. Frequently selected gene sets are associated with processes such as cell cycle, E2F regulation, DNA damage response, proteasome and glycolysis. We analyzed two modules related to cell cycle, and the OCT1 transcription factor, respectively. On an individual basis, these modules provide a significant separation in survival subgroups on the training and independent validation data.

摘要

背景

在训练疾病预后预测模型时，通常未利用大量微阵列数据集（汇编）的可用性，或关于基因分组到通路（基因集）的知识。这些可能是有用的，因为汇编增加了样本数量，而基因集减小了特征空间的大小。从机器学习的角度来看，这应该是有利的，并能产生更稳健的预测模型。

方法

我们从基因集和汇编中提取了受调控基因的模块。通过监督分析，我们构建了使用预测乳腺癌预后的模块的预测模型。为了验证这些预测模型，我们将它们应用于来自同一机构（数据集内）和其他机构（数据集间）的独立数据。

结论

我们表明，与基于基因的预测模型相比，从单个乳腺癌数据集中衍生的模块在验证数据上表现更好。我们还表明，在汇编特异性和预测性能方面存在一种趋势：与从人类癌症汇编中衍生的模块相比，从单个乳腺癌数据集和乳腺癌特异性汇编中衍生的模块表现更好。此外，基于模块的预测模型能更深入地洞察潜在生物学机制。经常被选择的基因集与细胞周期、E2F调控、DNA损伤反应、蛋白酶体和糖酵解等过程相关。我们分别分析了与细胞周期和OCT1转录因子相关的两个模块。就个体而言，这些模块在训练和独立验证数据上的生存亚组中提供了显著的区分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/46f7/2002511/60b664845dcf/pone.0001047.g001.jpg

相似文献

Module-based outcome prediction using breast cancer compendia.

PLoS One. 2007 Oct 17;2(10):e1047. doi: 10.1371/journal.pone.0001047.

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.

BMC Bioinformatics. 2007 Oct 26;8:415. doi: 10.1186/1471-2105-8-415.

Mixture classification model based on clinical markers for breast cancer prognosis.

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

Integrating biological knowledge with gene expression profiles for survival prediction of cancer.

J Comput Biol. 2009 Feb;16(2):265-78. doi: 10.1089/cmb.2008.12TT.

Gene co-expression modules as clinically relevant hallmarks of breast cancer diversity.

PLoS One. 2014 Feb 7;9(2):e88309. doi: 10.1371/journal.pone.0088309. eCollection 2014.

Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data.

BMC Genomics. 2004 Dec 14;5(1):94. doi: 10.1186/1471-2164-5-94.

Integrative analysis of survival-associated gene sets in breast cancer.

BMC Med Genomics. 2015 Mar 12;8:11. doi: 10.1186/s12920-015-0086-0.

Can survival prediction be improved by merging gene expression data sets?

PLoS One. 2009 Oct 23;4(10):e7431. doi: 10.1371/journal.pone.0007431.

Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.

PLoS One. 2009 Dec 11;4(12):e8250. doi: 10.1371/journal.pone.0008250.

Co-expression module analysis reveals biological processes, genomic gain, and regulatory mechanisms associated with breast cancer progression.

BMC Syst Biol. 2010 May 27;4:74. doi: 10.1186/1752-0509-4-74.

引用本文的文献

Identifying cancer prognostic modules by module network analysis.

BMC Bioinformatics. 2019 Feb 18;20(1):85. doi: 10.1186/s12859-019-2674-z.

A computational model to predict bone metastasis in breast cancer by integrating the dysregulated pathways.

BMC Cancer. 2014 Aug 27;14:618. doi: 10.1186/1471-2407-14-618.

Ensemble classifier based on context specific miRNA regulation modules: a new method for cancer outcome prediction.

BMC Bioinformatics. 2013;14 Suppl 12(Suppl 12):S6. doi: 10.1186/1471-2105-14-S12-S6. Epub 2013 Sep 24.

Prediction of breast cancer metastasis by gene expression profiles: a comparison of metagenes and single genes.

Cancer Inform. 2012;11:193-217. doi: 10.4137/CIN.S10375. Epub 2012 Dec 10.

Improved prognostic classification of breast cancer defined by antagonistic activation patterns of immune response pathway modules.

BMC Cancer. 2010 Nov 4;10:604. doi: 10.1186/1471-2407-10-604.

Module-based prediction approach for robust inter-study predictions in microarray data.

Bioinformatics. 2010 Oct 15;26(20):2586-93. doi: 10.1093/bioinformatics/btq472. Epub 2010 Aug 17.

Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context.

BMC Bioinformatics. 2010 May 25;11:277. doi: 10.1186/1471-2105-11-277.

Comparative study of gene set enrichment methods.

BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.

本文引用的文献

A mitochondria-K+ channel axis is suppressed in cancer and its normalization promotes apoptosis and inhibits cancer growth.

Cancer Cell. 2007 Jan;11(1):37-51. doi: 10.1016/j.ccr.2006.10.020.

A consensus prognostic gene expression classifier for ER positive breast cancer.

Genome Biol. 2006;7(10):R101. doi: 10.1186/gb-2006-7-10-r101. Epub 2006 Oct 31.

Integrative analysis of genome-wide experiments in the context of a large high-throughput data compendium.

Mol Syst Biol. 2005;1:2005.0002. doi: 10.1038/msb4100005. Epub 2005 Mar 29.

DNA-dependent conversion of Oct-1 and Oct-2 into transcriptional repressors by Groucho/TLE.

Nucleic Acids Res. 2005 Aug 15;33(14):4618-25. doi: 10.1093/nar/gki744. Print 2005.

A protocol for building and evaluating predictors of disease state based on microarray data.

Bioinformatics. 2005 Oct 1;21(19):3755-62. doi: 10.1093/bioinformatics/bti429. Epub 2005 Apr 7.

Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.

Lancet. 2005;365(9460):671-9. doi: 10.1016/S0140-6736(05)17947-1.

Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data.

BMC Genomics. 2004 Dec 14;5(1):94. doi: 10.1186/1471-2164-5-94.

A module map showing conditional activity of expression modules in cancer.

Nat Genet. 2004 Oct;36(10):1090-8. doi: 10.1038/ng1434. Epub 2004 Sep 26.

Improving identification of differentially expressed genes in microarray studies using information from public databases.

Genome Biol. 2004;5(9):R70. doi: 10.1186/gb-2004-5-9-r70. Epub 2004 Aug 26.

Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes.

BMC Bioinformatics. 2004 Jun 24;5:81. doi: 10.1186/1471-2105-5-81.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用乳腺癌综合数据集进行基于模块的结果预测。

Module-based outcome prediction using breast cancer compendia.

作者信息

van Vliet Martin H, Klijn Christiaan N, Wessels Lodewyk F A, Reinders Marcel J T

机构信息

Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands.