Suppr超能文献

基于泛基因组数据的泛癌生存的层次尖峰-哑块模型。

A hierarchical spike-and-slab model for pan-cancer survival using pan-omic data.

机构信息

Division of Biostatistics, University of Minnesota, Minneapolis, USA.

Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, USA.

出版信息

BMC Bioinformatics. 2022 Jun 17;23(1):235. doi: 10.1186/s12859-022-04770-3.

Abstract

BACKGROUND

Pan-omics, pan-cancer analysis has advanced our understanding of the molecular heterogeneity of cancer. However, such analyses have been limited in their ability to use information from multiple sources of data (e.g., omics platforms) and multiple sample sets (e.g., cancer types) to predict clinical outcomes. We address the issue of prediction across multiple high-dimensional sources of data and sample sets by using molecular patterns identified by BIDIFAC+, a method for integrative dimension reduction of bidimensionally-linked matrices, in a Bayesian hierarchical model. Our model performs variable selection through spike-and-slab priors that borrow information across clustered data. We use this model to predict overall patient survival from the Cancer Genome Atlas with data from 29 cancer types and 4 omics sources and use simulations to characterize the performance of the hierarchical spike-and-slab prior.

RESULTS

We found that molecular patterns shared across all or most cancers were largely not predictive of survival. However, our model selected patterns unique to subsets of cancers that differentiate clinical tumor subtypes with markedly different survival outcomes. Some of these subtypes were previously established, such as subtypes of uterine corpus endometrial carcinoma, while others may be novel, such as subtypes within a set of kidney carcinomas. Through simulations, we found that the hierarchical spike-and-slab prior performs best in terms of variable selection accuracy and predictive power when borrowing information is advantageous, but also offers competitive performance when it is not.

CONCLUSIONS

We address the issue of prediction across multiple sources of data by using results from BIDIFAC+ in a Bayesian hierarchical model for overall patient survival. By incorporating spike-and-slab priors that borrow information across cancers, we identified molecular patterns that distinguish clinical tumor subtypes within a single cancer and within a group of cancers. We also corroborate the flexibility and performance of using spike-and-slab priors as a Bayesian variable selection approach.

摘要

背景

泛肿瘤学、泛癌分析已经提高了我们对癌症分子异质性的认识。然而,此类分析在利用来自多个数据源(例如,组学平台)和多个样本集(例如,癌症类型)的信息来预测临床结果方面的能力有限。我们通过使用 BIDIFAC+ (一种用于二维相关矩阵集成降维的方法)识别的分子模式,在贝叶斯层次模型中解决了跨多个高维数据源和样本集进行预测的问题。我们的模型通过跨聚类数据借用信息的尖峰-哑板先验进行变量选择。我们使用该模型从癌症基因组图谱中使用来自 29 种癌症类型和 4 种组学来源的数据预测总体患者生存,并使用模拟来描述层次尖峰-哑板先验的性能。

结果

我们发现,在所有或大多数癌症中共享的分子模式在很大程度上不能预测生存。然而,我们的模型选择了仅存在于癌症亚组中的模式,这些模式可以区分具有明显不同生存结果的临床肿瘤亚型。其中一些亚型是以前建立的,例如子宫体子宫内膜癌的亚型,而其他亚型可能是新的,例如一组肾癌中的亚型。通过模拟,我们发现,当借用信息有利时,层次尖峰-哑板先验在变量选择准确性和预测能力方面表现最佳,但在没有借用信息时也具有竞争力。

结论

我们通过在用于总体患者生存的贝叶斯层次模型中使用 BIDIFAC+的结果来解决跨多个数据源进行预测的问题。通过引入跨癌症借用信息的尖峰-哑板先验,我们确定了可以区分单一癌症和一组癌症中的临床肿瘤亚型的分子模式。我们还证实了使用尖峰-哑板先验作为贝叶斯变量选择方法的灵活性和性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc9d/9204947/6a15c660660a/12859_2022_4770_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验