基于通路评分的预测模型：通路集合的稳健性和显著性。

Predictive modelling using pathway scores: robustness and significance of pathway collections.

机构信息

Computational and Systems Medicine, Department of Surgery and Cancer, Sir Alexander Fleming building, Imperial College, London, SW1 2AZ, UK.

Division of Cancer, Department of Surgery and Cancer, Imperial College London, Hammersmith Hospital Campus, W12 0NN, London, UK.

出版信息

BMC Bioinformatics. 2019 Nov 4;20(1):543. doi: 10.1186/s12859-019-3163-0.

DOI:10.1186/s12859-019-3163-0

PMID:31684857

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6827178/

Abstract

BACKGROUND

Transcriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We aimed to test this hypothesis by constructing models based on either genes alone, or based on sample specific scores for each pathway, thus transforming the data to a 'pathway space'. We progressively degraded the raw data by addition of noise and examined the ability of the models to maintain predictivity.

RESULTS

Models in the pathway space indeed had higher predictive robustness than models in the gene space. This result was independent of the workflow, parameters, classifier and data set used. Surprisingly, randomised pathway mappings produced models of similar accuracy and robustness to true mappings, suggesting that the success of pathway space models is not conferred by the specific definitions of the pathway. Instead, predictive models built on the true pathway mappings led to prediction rules with fewer influential pathways than those built on randomised pathways. The extent of this effect was used to differentiate pathway collections coming from a variety of widely used pathway databases.

CONCLUSIONS

Prediction models based on pathway scores are more robust to degradation of gene expression information than the equivalent models based on ungrouped genes. While models based on true pathway scores are not more robust or accurate than those based on randomised pathways, true pathways produced simpler prediction rules, emphasizing a smaller number of pathways.

摘要

背景

转录组数据常被用于构建统计学模型，以预测特定表型，如疾病状态。基因在通路中共同发挥作用，人们普遍认为通路的表示形式对基因表达水平的噪声更稳健。我们旨在通过构建基于基因本身或基于每个通路的样本特定分数的模型来检验这一假设，从而将数据转换为“通路空间”。我们通过添加噪声逐渐降低原始数据的质量，并检查模型保持可预测性的能力。

结果

通路空间中的模型确实比基因空间中的模型具有更高的预测稳健性。该结果独立于使用的工作流程、参数、分类器和数据集。令人惊讶的是，随机通路映射产生的模型与真实映射的准确性和稳健性相似，这表明通路空间模型的成功并非归因于通路的特定定义。相反，基于真实通路映射构建的预测模型导致具有较少影响通路的预测规则，而不是基于随机通路构建的模型。这种效果的程度可用于区分来自各种广泛使用的通路数据库的通路集合。

结论

基于通路得分的预测模型比基于未分组基因的等效模型更能抵抗基因表达信息的降解。虽然基于真实通路得分的模型不如基于随机通路的模型稳健或准确，但真实通路产生了更简单的预测规则，强调了较少数量的通路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ae/6827178/3a1bbec10715/12859_2019_3163_Fig1_HTML.jpg

相似文献

Predictive modelling using pathway scores: robustness and significance of pathway collections.

BMC Bioinformatics. 2019 Nov 4;20(1):543. doi: 10.1186/s12859-019-3163-0.

Subtype prediction in pediatric acute myeloid leukemia: classification using differential network rank conservation revisited.

BMC Bioinformatics. 2015 Sep 23;16:305. doi: 10.1186/s12859-015-0737-3.

A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways.

PLoS Comput Biol. 2021 Feb 11;17(2):e1008748. doi: 10.1371/journal.pcbi.1008748. eCollection 2021 Feb.

Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.

BMC Genomics. 2017 Jan 25;18(Suppl 1):1050. doi: 10.1186/s12864-016-3265-2.

A computational approach to identify cellular heterogeneity and tissue-specific gene regulatory networks.

BMC Bioinformatics. 2018 Jun 7;19(1):217. doi: 10.1186/s12859-018-2190-6.

Bi-directional gene set enrichment and canonical correlation analysis identify key diet-sensitive pathways and biomarkers of metabolic syndrome.

BMC Bioinformatics. 2010 Oct 7;11:499. doi: 10.1186/1471-2105-11-499.

Topologically inferring pathway activity for precise survival outcome prediction: breast cancer as a case.

Mol Biosyst. 2017 Feb 28;13(3):537-548. doi: 10.1039/c6mb00757k.

A novel pathway-based distance score enhances assessment of disease heterogeneity in gene expression.

BMC Bioinformatics. 2017 Jun 20;18(1):309. doi: 10.1186/s12859-017-1727-4.

Comparative evaluation of network features for the prediction of breast cancer metastasis.

BMC Med Genomics. 2020 Apr 3;13(Suppl 5):40. doi: 10.1186/s12920-020-0676-3.

Impact of Microarray Preprocessing Techniques in Unraveling Biological Pathways.

J Comput Biol. 2016 Dec;23(12):957-968. doi: 10.1089/cmb.2016.0042. Epub 2016 Aug 5.

引用本文的文献

A computational framework for detecting inter-tissue gene-expression coordination changes with aging.

Sci Rep. 2025 Mar 31;15(1):11014. doi: 10.1038/s41598-025-94043-9.

Construct prognostic models of multiple myeloma with pathway information incorporated.

PLoS Comput Biol. 2024 Sep 10;20(9):e1012444. doi: 10.1371/journal.pcbi.1012444. eCollection 2024 Sep.

StellarPath: Hierarchical-vertical multi-omics classifier synergizes stable markers and interpretable similarity networks for patient profiling.

PLoS Comput Biol. 2024 Apr 12;20(4):e1012022. doi: 10.1371/journal.pcbi.1012022. eCollection 2024 Apr.

PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration.

PLoS Comput Biol. 2024 Mar 25;20(3):e1011814. doi: 10.1371/journal.pcbi.1011814. eCollection 2024 Mar.

PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration.

bioRxiv. 2024 Jan 9:2024.01.09.574780. doi: 10.1101/2024.01.09.574780.

Optimisation Models for Pathway Activity Inference in Cancer.

Cancers (Basel). 2023 Mar 15;15(6):1787. doi: 10.3390/cancers15061787.

Single sample pathway analysis in metabolomics: performance evaluation and application.

BMC Bioinformatics. 2022 Nov 14;23(1):481. doi: 10.1186/s12859-022-05005-1.

A functional analysis of 180 cancer cell lines reveals conserved intrinsic metabolic programs.

Mol Syst Biol. 2022 Nov;18(11):e11033. doi: 10.15252/msb.202211033.

Multiplexed Human Gene Expression Analysis Reveals a Central Role of the TLR/mTOR/PPARγ and NFkB Axes in Burn and Inhalation Injury-Induced Changes in Systemic Immunometabolism and Long-Term Patient Outcomes.

Int J Mol Sci. 2022 Aug 20;23(16):9418. doi: 10.3390/ijms23169418.

Molecular Subtyping of Cancer Based on Distinguishing Co-Expression Modules and Machine Learning.

Front Genet. 2022 May 2;13:866005. doi: 10.3389/fgene.2022.866005. eCollection 2022.

本文引用的文献

De novo pathway-based biomarker identification.

Nucleic Acids Res. 2017 Sep 19;45(16):e151. doi: 10.1093/nar/gkx642.

FERAL: network-based classifier with application to breast cancer outcome prediction.

Bioinformatics. 2015 Jun 15;31(12):i311-9. doi: 10.1093/bioinformatics/btv255.

Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis.

Front Genet. 2013 Dec 23;4:289. doi: 10.3389/fgene.2013.00289. eCollection 2013.

Prediction of breast cancer metastasis by gene expression profiles: a comparison of metagenes and single genes.

Cancer Inform. 2012;11:193-217. doi: 10.4137/CIN.S10375. Epub 2012 Dec 10.

Comparison and evaluation of pathway-level aggregation methods of gene expression data.

BMC Genomics. 2012;13 Suppl 7(Suppl 7):S26. doi: 10.1186/1471-2164-13-S7-S26. Epub 2012 Dec 13.

Comparative evaluation of set-level techniques in predictive classification of gene expression samples.

BMC Bioinformatics. 2012 Jun 25;13 Suppl 10(Suppl 10):S15. doi: 10.1186/1471-2105-13-S10-S15.

Pathway-based classification of cancer subtypes.

Biol Direct. 2012 Jul 3;7:21. doi: 10.1186/1745-6150-7-21.

A critical evaluation of network and pathway-based classifiers for outcome prediction in breast cancer.

PLoS One. 2012;7(4):e34796. doi: 10.1371/journal.pone.0034796. Epub 2012 Apr 27.

Most random gene expression signatures are significantly associated with breast cancer outcome.

PLoS Comput Biol. 2011 Oct;7(10):e1002240. doi: 10.1371/journal.pcbi.1002240. Epub 2011 Oct 20.

Strategies for aggregating gene expression data: the collapseRows R function.

BMC Bioinformatics. 2011 Aug 4;12:322. doi: 10.1186/1471-2105-12-322.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于通路评分的预测模型：通路集合的稳健性和显著性。

Predictive modelling using pathway scores: robustness and significance of pathway collections.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献