Early Clinical Development Oncology Statistics, Pfizer Inc., San Diego, CA 92121, USA.
Department of Statistics.
Bioinformatics. 2020 Jul 1;36(13):3951-3958. doi: 10.1093/bioinformatics/btaa286.
It is well known that the integration among different data-sources is reliable because of its potential of unveiling new functionalities of the genomic expressions, which might be dormant in a single-source analysis. Moreover, different studies have justified the more powerful analyses of multi-platform data. Toward this, in this study, we consider the circadian genes' omics profile, such as copy number changes and RNA-sequence data along with their survival response. We develop a Bayesian structural equation modeling coupled with linear regressions and log normal accelerated failure-time regression to integrate the information between these two platforms to predict the survival of the subjects. We place conjugate priors on the regression parameters and derive the Gibbs sampler using the conditional distributions of them.
Our extensive simulation study shows that the integrative model provides a better fit to the data than its closest competitor. The analyses of glioblastoma cancer data and the breast cancer data from TCGA, the largest genomics and transcriptomics database, support our findings.
The developed method is wrapped in R package available at https://github.com/MAITYA02/semmcmc.
Supplementary data are available at Bioinformatics online.
众所周知,不同数据源的整合是可靠的,因为它有可能揭示基因组表达的新功能,而这些功能在单一来源的分析中可能是休眠的。此外,不同的研究已经证明了多平台数据的更强大的分析。为此,在这项研究中,我们考虑了生物钟基因的组学特征,如拷贝数变化和 RNA 测序数据,以及它们的生存反应。我们开发了一种贝叶斯结构方程模型,结合线性回归和对数正态加速失效时间回归,以整合这两个平台之间的信息,从而预测受试者的生存情况。我们在回归参数上放置了共轭先验,并使用它们的条件分布来推导 Gibbs 抽样器。
我们广泛的模拟研究表明,整合模型比其最接近的竞争对手提供了更好的拟合数据。对胶质母细胞瘤癌症数据和 TCGA(最大的基因组和转录组学数据库)的乳腺癌数据的分析支持了我们的发现。
所开发的方法被包装在 R 包中,可在 https://github.com/MAITYA02/semmcmc 上获得。
补充数据可在生物信息学在线获得。