Department of Mathematics and Statistics, University of Calgary, 2500 University Dr NW, Calgary, AB, T2N1N4, Canada.
Division of Biostatistics, University of Minnesota, 20 Delaware St SE, Minneapolis, MN 55455, USA.
Biostatistics. 2022 Dec 12;24(1):124-139. doi: 10.1093/biostatistics/kxab016.
The problem of associating data from multiple sources and predicting an outcome simultaneously is an important one in modern biomedical research. It has potential to identify multidimensional array of variables predictive of a clinical outcome and to enhance our understanding of the pathobiology of complex diseases. Incorporating functional knowledge in association and prediction models can reveal pathways contributing to disease risk. We propose Bayesian hierarchical integrative analysis models that associate multiple omics data, predict a clinical outcome, allow for prior functional information, and can accommodate clinical covariates. The models, motivated by available data and the need for exploring other risk factors of atherosclerotic cardiovascular disease (ASCVD), are used for integrative analysis of clinical, demographic, and genomics data to identify genetic variants, genes, and gene pathways likely contributing to 10-year ASCVD risk in healthy adults. Our findings revealed several genetic variants, genes, and gene pathways that are highly associated with ASCVD risk, with some already implicated in cardiovascular disease (CVD) risk. Extensive simulations demonstrate the merit of joint association and prediction models over two-stage methods: association followed by prediction.
将来自多个来源的数据进行关联并同时预测结果是现代生物医学研究中的一个重要问题。它有可能识别出多维变量数组,这些变量可以预测临床结果,并增强我们对复杂疾病的病理生物学的理解。在关联和预测模型中纳入功能知识可以揭示导致疾病风险的途径。我们提出了贝叶斯分层综合分析模型,该模型可以关联多个组学数据、预测临床结果、允许使用先前的功能信息,并可以适应临床协变量。这些模型基于现有数据和探索动脉粥样硬化性心血管疾病(ASCVD)其他风险因素的需要,用于综合分析临床、人口统计学和基因组学数据,以识别可能导致健康成年人 10 年 ASCVD 风险的遗传变异、基因和基因途径。我们的研究结果揭示了一些与 ASCVD 风险高度相关的遗传变异、基因和基因途径,其中一些已经与心血管疾病(CVD)风险有关。广泛的模拟表明,联合关联和预测模型优于两阶段方法:关联后预测。