Suppr超能文献

比较癌症预后预测的通路和基因水平模型。

Comparison of pathway and gene-level models for cancer prognosis prediction.

机构信息

Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, 03755, USA.

Department of Medicine, Baylor College of Medicine, Institute for Clinical and Translational Research, 1 Baylor Plaza, Houston, TX, 77030, USA.

出版信息

BMC Bioinformatics. 2020 Feb 28;21(1):76. doi: 10.1186/s12859-020-3423-z.

Abstract

BACKGROUND

Cancer prognosis prediction is valuable for patients and clinicians because it allows them to appropriately manage care. A promising direction for improving the performance and interpretation of expression-based predictive models involves the aggregation of gene-level data into biological pathways. While many studies have used pathway-level predictors for cancer survival analysis, a comprehensive comparison of pathway-level and gene-level prognostic models has not been performed. To address this gap, we characterized the performance of penalized Cox proportional hazard models built using either pathway- or gene-level predictors for the cancers profiled in The Cancer Genome Atlas (TCGA) and pathways from the Molecular Signatures Database (MSigDB).

RESULTS

When analyzing TCGA data, we found that pathway-level models are more parsimonious, more robust, more computationally efficient and easier to interpret than gene-level models with similar predictive performance. For example, both pathway-level and gene-level models have an average Cox concordance index of ~ 0.85 for the TCGA glioma cohort, however, the gene-level model has twice as many predictors on average, the predictor composition is less stable across cross-validation folds and estimation takes 40 times as long as compared to the pathway-level model. When the complex correlation structure of the data is broken by permutation, the pathway-level model has greater predictive performance while still retaining superior interpretative power, robustness, parsimony and computational efficiency relative to the gene-level models. For example, the average concordance index of the pathway-level model increases to 0.88 while the gene-level model falls to 0.56 for the TCGA glioma cohort using survival times simulated from uncorrelated gene expression data.

CONCLUSION

The results of this study show that when the correlations among gene expression values are low, pathway-level analyses can yield better predictive performance, greater interpretative power, more robust models and less computational cost relative to a gene-level model. When correlations among genes are high, a pathway-level analysis provides equivalent predictive power compared to a gene-level analysis while retaining the advantages of interpretability, robustness and computational efficiency.

摘要

背景

癌症预后预测对患者和临床医生很有价值,因为它可以帮助他们更好地管理治疗。提高基于表达的预测模型性能和解释的有前途的方向涉及将基因水平的数据聚合到生物途径中。虽然许多研究已经使用途径水平的预测因子进行癌症生存分析,但尚未对途径水平和基因水平的预后模型进行全面比较。为了解决这一差距,我们对基于 TCGA 中分析的癌症和 MSigDB 中途径的途径水平或基因水平预测因子构建的惩罚 Cox 比例风险模型的性能进行了特征描述。

结果

在分析 TCGA 数据时,我们发现与具有相似预测性能的基因水平模型相比,途径水平模型更简洁、更稳健、更计算高效且更易于解释。例如,TCGA 神经胶质瘤队列中,途径水平和基因水平模型的平均 Cox 一致性指数均约为 0.85,但基因水平模型的预测因子平均数量是其两倍,预测因子组成在交叉验证折叠中不太稳定,估计时间比途径水平模型长 40 倍。当通过置换打破数据的复杂相关结构时,途径水平模型具有更好的预测性能,同时相对于基因水平模型,仍然保持卓越的解释能力、稳健性、简洁性和计算效率。例如,使用来自不相关基因表达数据模拟的生存时间,TCGA 神经胶质瘤队列中途径水平模型的平均一致性指数增加到 0.88,而基因水平模型则降至 0.56。

结论

本研究的结果表明,当基因表达值之间的相关性较低时,与基因水平模型相比,途径水平分析可以产生更好的预测性能、更强的解释能力、更稳健的模型和更低的计算成本。当基因之间的相关性较高时,与基因水平分析相比,途径水平分析提供等效的预测能力,同时保留了可解释性、稳健性和计算效率的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/74e9/7048092/b5c88dacd32f/12859_2020_3423_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验