整合生物学知识与基因表达谱以预测癌症患者的生存情况。

Integrating biological knowledge with gene expression profiles for survival prediction of cancer.

作者信息

Chen Xi, Wang Lily

机构信息

Department of Quantitative Health Sciences, The Cleveland Clinic, Cleveland, OH 44195, USA.

出版信息

J Comput Biol. 2009 Feb;16(2):265-78. doi: 10.1089/cmb.2008.12TT.

DOI:10.1089/cmb.2008.12TT

PMID:19183004

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3198940/

Abstract

Due to the large variability in survival times between cancer patients and the plethora of genes on microarrays unrelated to outcome, building accurate prediction models that are easy to interpret remains a challenge. In this paper, we propose a general strategy for improving performance and interpretability of prediction models by integrating gene expression data with prior biological knowledge. First, we link gene identifiers in expression dataset with gene annotation databases such as Gene Ontology (GO). Then we construct "supergenes" for each gene category by summarizing information from genes related to outcome using a modified principal component analysis (PCA) method. Finally, instead of using genes as predictors, we use these supergenes representing information from each gene category as predictors to predict survival outcome. In addition to identifying gene categories associated with outcome, the proposed approach also carries out additional within-category selection to select important genes within each gene set. We show, using two real breast cancer microarray datasets, that the prediction models constructed based on gene sets (or pathway) information outperform the prediction models based on expression values of single genes, with improved prediction accuracy and interpretability.

摘要

由于癌症患者之间生存时间差异巨大，且微阵列上大量基因与预后无关，构建易于解释的准确预测模型仍然是一项挑战。在本文中，我们提出了一种通用策略，通过将基因表达数据与先验生物学知识相结合来提高预测模型的性能和可解释性。首先，我们将表达数据集中的基因标识符与诸如基因本体论（GO）等基因注释数据库相链接。然后，我们使用改进的主成分分析（PCA）方法，通过汇总与预后相关基因的信息，为每个基因类别构建“超级基因”。最后，我们不是使用单个基因作为预测因子，而是使用这些代表每个基因类别信息的超级基因作为预测因子来预测生存结果。除了识别与预后相关的基因类别外，所提出的方法还进行额外的类别内选择，以在每个基因集中选择重要基因。我们使用两个真实的乳腺癌微阵列数据集表明，基于基因集（或通路）信息构建的预测模型优于基于单个基因表达值构建的预测模型，具有更高的预测准确性和可解释性。

相似文献

Integrating biological knowledge with gene expression profiles for survival prediction of cancer.

J Comput Biol. 2009 Feb;16(2):265-78. doi: 10.1089/cmb.2008.12TT.

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets.

BMC Bioinformatics. 2007 Oct 26;8:415. doi: 10.1186/1471-2105-8-415.

Mixture classification model based on clinical markers for breast cancer prognosis.

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO.

BMC Bioinformatics. 2017 Feb 10;18(1):99. doi: 10.1186/s12859-017-1515-1.

Module-based outcome prediction using breast cancer compendia.

PLoS One. 2007 Oct 17;2(10):e1047. doi: 10.1371/journal.pone.0001047.

Can survival prediction be improved by merging gene expression data sets?

PLoS One. 2009 Oct 23;4(10):e7431. doi: 10.1371/journal.pone.0007431.

Topologically inferring pathway activity for precise survival outcome prediction: breast cancer as a case.

Mol Biosyst. 2017 Feb 28;13(3):537-548. doi: 10.1039/c6mb00757k.

Knowledge-guided multi-scale independent component analysis for biomarker identification.

BMC Bioinformatics. 2008 Oct 6;9:416. doi: 10.1186/1471-2105-9-416.

Predictors of breast cancer cell types and their prognostic power in breast cancer patients.

BMC Genomics. 2018 Feb 13;19(1):137. doi: 10.1186/s12864-018-4527-y.

Improving the prediction of chemotherapeutic sensitivity of tumors in breast cancer via optimizing the selection of candidate genes.

Comput Biol Chem. 2014 Apr;49:71-8. doi: 10.1016/j.compbiolchem.2013.12.002. Epub 2014 Jan 1.

引用本文的文献

A novel non-negative Bayesian stacking modeling method for Cancer survival prediction using high-dimensional omics data.

BMC Med Res Methodol. 2024 May 3;24(1):105. doi: 10.1186/s12874-024-02232-3.

A non-negative spike-and-slab lasso generalized linear stacking prediction modeling method for high-dimensional omics data.

BMC Bioinformatics. 2024 Mar 20;25(1):119. doi: 10.1186/s12859-024-05741-6.

Optimisation Models for Pathway Activity Inference in Cancer.

Cancers (Basel). 2023 Mar 15;15(6):1787. doi: 10.3390/cancers15061787.

A Pipeline for Integrated Theory and Data-Driven Modeling of Biomedical Data.

IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):811-822. doi: 10.1109/TCBB.2020.3019237. Epub 2021 Jun 3.

Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery.

BMC Bioinformatics. 2020 Mar 11;21(Suppl 2):77. doi: 10.1186/s12859-020-3344-x.

Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation.

Cancer Inform. 2016 Sep 20;15:189-98. doi: 10.4137/CIN.S39859. eCollection 2016.

A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data.

Adv Bioinformatics. 2015;2015:198363. doi: 10.1155/2015/198363. Epub 2015 Jun 11.

An integrated analysis of the effects of microRNA and mRNA on esophageal squamous cell carcinoma.

Mol Med Rep. 2015 Jul;12(1):945-52. doi: 10.3892/mmr.2015.3557. Epub 2015 Mar 27.

Integrated microRNA-mRNA analysis revealing the potential roles of microRNAs in tongue squamous cell cancer.

Mol Med Rep. 2015 Jul;12(1):885-94. doi: 10.3892/mmr.2015.3467. Epub 2015 Mar 11.

Extending information retrieval methods to personalized genomic-based studies of disease.

Cancer Inform. 2015 Feb 10;13(Suppl 7):85-95. doi: 10.4137/CIN.S16354. eCollection 2014.

本文引用的文献

An integrated approach for the analysis of biological pathways using mixed models.

PLoS Genet. 2008 Jul;4(7):e1000115. doi: 10.1371/journal.pgen.1000115. Epub 2008 Jul 4.

Supervised principal component analysis for gene set enrichment of microarray data with continuous or survival outcomes.

Bioinformatics. 2008 Nov 1;24(21):2474-81. doi: 10.1093/bioinformatics/btn458. Epub 2008 Aug 27.

The humoral immune system has a key prognostic impact in node-negative breast cancer.

Cancer Res. 2008 Jul 1;68(13):5405-13. doi: 10.1158/0008-5472.CAN-07-5206.

Network-based classification of breast cancer metastasis.

Mol Syst Biol. 2007;3:140. doi: 10.1038/msb4100180. Epub 2007 Oct 16.

Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer.

BMC Cancer. 2007 Sep 25;7:182. doi: 10.1186/1471-2407-7-182.

An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer.

Genome Biol. 2007;8(8):R157. doi: 10.1186/gb-2007-8-8-r157.

Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms.

Bioinformatics. 2007 Jul 15;23(14):1775-82. doi: 10.1093/bioinformatics/btm234. Epub 2007 May 5.

Supervised group Lasso with applications to microarray data analysis.

BMC Bioinformatics. 2007 Feb 22;8:60. doi: 10.1186/1471-2105-8-60.

Development and evaluation of therapeutically relevant predictive classifiers using gene expression profiling.

J Natl Cancer Inst. 2006 Sep 6;98(17):1169-71. doi: 10.1093/jnci/djj364.

Group testing for pathway analysis improves comparability of different microarray datasets.

Bioinformatics. 2006 Oct 15;22(20):2500-6. doi: 10.1093/bioinformatics/btl424. Epub 2006 Aug 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

整合生物学知识与基因表达谱以预测癌症患者的生存情况。

Integrating biological knowledge with gene expression profiles for survival prediction of cancer.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献