Vacher Michael, Canovas Rodrigo, Laws Simon M, Doecke James D
The Australian eHealth Research Centre, CSIRO Health and Biosecurity, Kensington, WA, Australia.
Centre for Precision Health, Edith Cowan University, Joondalup, WA, Australia.
Front Bioinform. 2024 Jun 19;4:1390607. doi: 10.3389/fbinf.2024.1390607. eCollection 2024.
Complex disorders, such as Alzheimer's disease (AD), result from the combined influence of multiple biological and environmental factors. The integration of high-throughput data from multiple omics platforms can provide system overviews, improving our understanding of complex biological processes underlying human disease. In this study, integrated data from four omics platforms were used to characterise biological signatures of AD.
The study cohort consists of 455 participants (Control:148, Cases:307) from the Religious Orders Study and Memory and Aging Project (ROSMAP). Genotype (SNP), methylation (CpG), RNA and proteomics data were collected, quality-controlled and pre-processed (SNP = 130; CpG = 83; RNA = 91; Proteomics = 119). Using a diagnosis of Mild Cognitive Impairment (MCI)/AD combined as the target phenotype, we first used Partial Least Squares Regression as an unsupervised classification framework to assess the prediction capabilities for each omics dataset individually. We then used a variation of the sparse generalized canonical correlation analysis (sGCCA) to assess predictions of the combined datasets and identify multi-omics signatures characterising each group of participants.
Analysing datasets individually we found methylation data provided the best predictions with an accuracy of 0.63 (95%CI = [0.54-0.71]), followed by RNA, 0.61 (95%CI = [0.52-0.69]), SNP, 0.59 (95%CI = [0.51-0.68]) and proteomics, 0.58 (95%CI = [0.51-0.67]). After integration of the four datasets, predictions were dramatically improved with a resulting accuracy of 0.95 (95% CI = [0.89-0.98]).
The integration of data from multiple platforms is a powerful approach to explore biological systems and better characterise the biological signatures of AD. The results suggest that integrative methods can identify biomarker panels with improved predictive performance compared to individual platforms alone. Further validation in independent cohorts is required to validate and refine the results presented in this study.
诸如阿尔茨海默病(AD)等复杂疾病是多种生物和环境因素共同作用的结果。整合来自多个组学平台的高通量数据能够提供系统概述,增进我们对人类疾病潜在复杂生物学过程的理解。在本研究中,来自四个组学平台的整合数据被用于描绘AD的生物学特征。
研究队列由宗教团体研究与记忆及衰老项目(ROSMAP)的455名参与者组成(对照组:148名,病例组:307名)。收集了基因型(SNP)、甲基化(CpG)、RNA和蛋白质组学数据,并进行了质量控制和预处理(SNP = 130;CpG = 83;RNA = 91;蛋白质组学 = 119)。以轻度认知障碍(MCI)/AD的诊断组合作为目标表型,我们首先使用偏最小二乘回归作为无监督分类框架,分别评估每个组学数据集的预测能力。然后,我们使用稀疏广义典型相关分析(sGCCA)的一种变体来评估组合数据集的预测,并识别表征每组参与者的多组学特征。
单独分析数据集时,我们发现甲基化数据的预测效果最佳,准确率为0.63(95%置信区间 = [0.54 - 0.71]),其次是RNA,为0.61(95%置信区间 = [0.52 - 0.69]),SNP为0.59(95%置信区间 = [0.51 - 0.68]),蛋白质组学为0.58(95%置信区间 = [0.51 - 0.67])。整合四个数据集后,预测准确率显著提高,达到0.95(95%置信区间 = [0.89 - 0.98])。
整合来自多个平台的数据是探索生物系统并更好地描绘AD生物学特征的有力方法。结果表明,与单独的单个平台相比,整合方法能够识别出具有更高预测性能的生物标志物组合。需要在独立队列中进行进一步验证,以验证和完善本研究中呈现的结果。