Moon Sehwan, Lee Hyunju
School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea.
J Pers Med. 2021 Jul 21;11(8):686. doi: 10.3390/jpm11080686.
High dimensional multi-omics data integration can enhance our understanding of the complex biological interactions in human diseases. However, most studies involving unsupervised integration of multi-omics data focus on linear integration methods. In this study, we propose a joint deep semi-non-negative matrix factorization (JDSNMF) model, which uses a hierarchical non-linear feature extraction approach that can capture shared latent features from the complex multi-omics data. The extracted latent features obtained from JDSNMF enabled a variety of downstream tasks, including prediction of disease and module analysis. The proposed model is applicable not only to sample-matched multiple data (e.g., multi-omics data from one cohort) but also to feature-matched multiple data (e.g., omics data from multiple cohorts), and therefore it can be flexibly applied to various cases. We demonstrate the capabilities of JDSNMF using sample-matched simulated data and feature-matched multi-omics data from Alzheimer's disease cohorts, evaluating the feature extraction performance in the context of classification. In a test application, we identify AD- and age-related modules from the latent matrices using an explainable artificial intelligence and regression model. These results show that the JDSNMF model is effective in identifying latent features having a complex interplay of potential biological signatures.
高维多组学数据整合能够增强我们对人类疾病中复杂生物相互作用的理解。然而,大多数涉及多组学数据无监督整合的研究都集中在线性整合方法上。在本研究中,我们提出了一种联合深度半非负矩阵分解(JDSNMF)模型,该模型使用分层非线性特征提取方法,能够从复杂的多组学数据中捕获共享的潜在特征。从JDSNMF获得的提取潜在特征可用于各种下游任务,包括疾病预测和模块分析。所提出的模型不仅适用于样本匹配的多个数据(例如,来自一个队列的多组学数据),也适用于特征匹配的多个数据(例如,来自多个队列的组学数据),因此它可以灵活地应用于各种情况。我们使用样本匹配的模拟数据和来自阿尔茨海默病队列的特征匹配多组学数据来展示JDSNMF的能力,在分类的背景下评估特征提取性能。在一个测试应用中,我们使用可解释人工智能和回归模型从潜在矩阵中识别出与阿尔茨海默病和年龄相关的模块。这些结果表明,JDSNMF模型在识别具有潜在生物学特征复杂相互作用的潜在特征方面是有效的。