Department of Statistics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
Department of Pathology, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Queen Mary Hospital, Hong Kong SAR, China.
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae454.
Compared with analyzing omics data from a single platform, an integrative analysis of multi-omics data provides a more comprehensive understanding of the regulatory relationships among biological features associated with complex diseases. However, most existing frameworks for integrative analysis overlook two crucial aspects of multi-omics data. Firstly, they neglect the known dependencies among biological features that exist in highly credible biological databases. Secondly, most existing integrative frameworks just simply remove the subjects without full omics data to handle block missingness, resulting in decreasing statistical power. To overcome these issues, we propose a network-based integrative Bayesian framework for biomarker selection and disease outcome prediction based on multi-omics data. Our framework utilizes Dirac spike-and-slab variable selection prior to identifying a small subset of biomarkers. The incorporation of gene pathway information improves the interpretability of feature selection. Furthermore, with the strategy in the FBM (stand for "full Bayesian model with missingness") model where missing omics data are augmented via a mechanistic model, our framework handles block missingness in multi-omics data via a data augmentation approach. The real application illustrates that our approach, which incorporates existing gene pathway information and includes subjects without DNA methylation data, results in more interpretable feature selection results and more accurate predictions.
与分析单一平台的组学数据相比,对多组学数据进行综合分析可以更全面地了解与复杂疾病相关的生物特征之间的调控关系。然而,大多数现有的综合分析框架忽略了多组学数据的两个关键方面。首先,它们忽略了高度可信的生物数据库中存在的生物特征之间的已知依赖关系。其次,大多数现有的综合框架只是简单地删除了没有完整组学数据的对象来处理块缺失,从而降低了统计能力。为了克服这些问题,我们提出了一种基于网络的综合贝叶斯框架,用于基于多组学数据进行生物标志物选择和疾病结果预测。我们的框架利用 Dirac 尖峰和板条变量选择来识别一小部分生物标志物。基因途径信息的纳入提高了特征选择的可解释性。此外,通过 FBM(代表“具有缺失的全贝叶斯模型”)模型中的策略,通过一种机制模型来增加缺失的组学数据,我们的框架通过数据增强方法处理多组学数据中的块缺失。实际应用表明,我们的方法,它整合了现有的基因途径信息,并包括没有 DNA 甲基化数据的对象,导致更具可解释性的特征选择结果和更准确的预测。