Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, 19104, PA, USA.
Department of Radiology and Imaging Sciences, Indiana University, Indianapolis, 46202, IN, USA.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad073.
With the rapid development of modern technologies, massive data are available for the systematic study of Alzheimer's disease (AD). Though many existing AD studies mainly focus on single-modality omics data, multi-omics datasets can provide a more comprehensive understanding of AD. To bridge this gap, we proposed a novel structural Bayesian factor analysis framework (SBFA) to extract the information shared by multi-omics data through the aggregation of genotyping data, gene expression data, neuroimaging phenotypes and prior biological network knowledge. Our approach can extract common information shared by different modalities and encourage biologically related features to be selected, guiding future AD research in a biologically meaningful way.
Our SBFA model decomposes the mean parameters of the data into a sparse factor loading matrix and a factor matrix, where the factor matrix represents the common information extracted from multi-omics and imaging data. Our framework is designed to incorporate prior biological network information. Our simulation study demonstrated that our proposed SBFA framework could achieve the best performance compared with the other state-of-the-art factor-analysis-based integrative analysis methods.
We apply our proposed SBFA model together with several state-of-the-art factor analysis models to extract the latent common information from genotyping, gene expression and brain imaging data simultaneously from the ADNI biobank database. The latent information is then used to predict the functional activities questionnaire score, an important measurement for diagnosis of AD quantifying subjects' abilities in daily life. Our SBFA model shows the best prediction performance compared with the other factor analysis models.
Code are publicly available at https://github.com/JingxuanBao/SBFA.
随着现代技术的快速发展,大量数据可用于对阿尔茨海默病(AD)进行系统研究。尽管许多现有的 AD 研究主要集中在单一模式的组学数据上,但多组学数据集可以提供对 AD 的更全面理解。为了弥补这一差距,我们提出了一种新的结构贝叶斯因子分析框架(SBFA),通过聚合基因分型数据、基因表达数据、神经影像学表型和先前的生物网络知识,从多组学数据中提取共享信息。我们的方法可以提取不同模态之间共享的常见信息,并鼓励选择具有生物学相关性的特征,从而以有生物学意义的方式指导未来的 AD 研究。
我们的 SBFA 模型将数据的均值参数分解为稀疏因子加载矩阵和因子矩阵,其中因子矩阵表示从多组学和影像数据中提取的共同信息。我们的框架旨在纳入先前的生物网络信息。我们的模拟研究表明,与其他基于因子分析的综合分析方法相比,我们提出的 SBFA 框架可以达到最佳性能。
我们应用我们提出的 SBFA 模型以及其他几种先进的因子分析模型,从 ADNI 生物库数据库中同时提取基因分型、基因表达和脑影像数据中的潜在共同信息。然后,将潜在信息用于预测功能活动问卷评分,这是一种用于量化 AD 患者日常生活能力的重要诊断测量。与其他因子分析模型相比,我们的 SBFA 模型显示出最佳的预测性能。
代码可在 https://github.com/JingxuanBao/SBFA 上公开获取。