基于泛基因组数据的泛癌生存的层次尖峰-哑块模型。

A hierarchical spike-and-slab model for pan-cancer survival using pan-omic data.

机构信息

Division of Biostatistics, University of Minnesota, Minneapolis, USA.

Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, USA.

出版信息

BMC Bioinformatics. 2022 Jun 17;23(1):235. doi: 10.1186/s12859-022-04770-3.

DOI:10.1186/s12859-022-04770-3

PMID:35710340

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9204947/

Abstract

BACKGROUND

Pan-omics, pan-cancer analysis has advanced our understanding of the molecular heterogeneity of cancer. However, such analyses have been limited in their ability to use information from multiple sources of data (e.g., omics platforms) and multiple sample sets (e.g., cancer types) to predict clinical outcomes. We address the issue of prediction across multiple high-dimensional sources of data and sample sets by using molecular patterns identified by BIDIFAC+, a method for integrative dimension reduction of bidimensionally-linked matrices, in a Bayesian hierarchical model. Our model performs variable selection through spike-and-slab priors that borrow information across clustered data. We use this model to predict overall patient survival from the Cancer Genome Atlas with data from 29 cancer types and 4 omics sources and use simulations to characterize the performance of the hierarchical spike-and-slab prior.

RESULTS

We found that molecular patterns shared across all or most cancers were largely not predictive of survival. However, our model selected patterns unique to subsets of cancers that differentiate clinical tumor subtypes with markedly different survival outcomes. Some of these subtypes were previously established, such as subtypes of uterine corpus endometrial carcinoma, while others may be novel, such as subtypes within a set of kidney carcinomas. Through simulations, we found that the hierarchical spike-and-slab prior performs best in terms of variable selection accuracy and predictive power when borrowing information is advantageous, but also offers competitive performance when it is not.

CONCLUSIONS

We address the issue of prediction across multiple sources of data by using results from BIDIFAC+ in a Bayesian hierarchical model for overall patient survival. By incorporating spike-and-slab priors that borrow information across cancers, we identified molecular patterns that distinguish clinical tumor subtypes within a single cancer and within a group of cancers. We also corroborate the flexibility and performance of using spike-and-slab priors as a Bayesian variable selection approach.

摘要

背景

泛肿瘤学、泛癌分析已经提高了我们对癌症分子异质性的认识。然而，此类分析在利用来自多个数据源（例如，组学平台）和多个样本集（例如，癌症类型）的信息来预测临床结果方面的能力有限。我们通过使用 BIDIFAC+ （一种用于二维相关矩阵集成降维的方法）识别的分子模式，在贝叶斯层次模型中解决了跨多个高维数据源和样本集进行预测的问题。我们的模型通过跨聚类数据借用信息的尖峰-哑板先验进行变量选择。我们使用该模型从癌症基因组图谱中使用来自 29 种癌症类型和 4 种组学来源的数据预测总体患者生存，并使用模拟来描述层次尖峰-哑板先验的性能。

结果

我们发现，在所有或大多数癌症中共享的分子模式在很大程度上不能预测生存。然而，我们的模型选择了仅存在于癌症亚组中的模式，这些模式可以区分具有明显不同生存结果的临床肿瘤亚型。其中一些亚型是以前建立的，例如子宫体子宫内膜癌的亚型，而其他亚型可能是新的，例如一组肾癌中的亚型。通过模拟，我们发现，当借用信息有利时，层次尖峰-哑板先验在变量选择准确性和预测能力方面表现最佳，但在没有借用信息时也具有竞争力。