Li Wenrui, Ballard Jenna, Zhao Yize, Long Qi
Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, 19104, PA, USA.
Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, 19104, PA, USA.
Comput Struct Biotechnol J. 2024 Apr 30;23:1945-1950. doi: 10.1016/j.csbj.2024.04.053. eCollection 2024 Dec.
Integrative analysis of multi-omics data has the potential to yield valuable and comprehensive insights into the molecular mechanisms underlying complex diseases such as cancer and Alzheimer's disease. However, a number of analytical challenges complicate multi-omics data integration. For instance, -omics data are usually high-dimensional, and sample sizes in multi-omics studies tend to be modest. Furthermore, when genes in an important pathway have relatively weak signal, it can be difficult to detect them individually. There is a growing body of literature on knowledge-guided learning methods that can address these challenges by incorporating biological knowledge such as functional genomics and functional proteomics into multi-omics data analysis. These methods have been shown to outperform their counterparts that do not utilize biological knowledge in tasks including prediction, feature selection, clustering, and dimension reduction. In this review, we survey recently developed methods and applications of knowledge-guided multi-omics data integration methods and discuss future research directions.
多组学数据的综合分析有潜力为癌症和阿尔茨海默病等复杂疾病的分子机制提供有价值且全面的见解。然而,一些分析挑战使多组学数据整合变得复杂。例如,组学数据通常是高维的,多组学研究中的样本量往往适中。此外,当重要通路中的基因信号相对较弱时,很难单独检测到它们。关于知识引导学习方法的文献越来越多,这些方法可以通过将功能基因组学和功能蛋白质组学等生物知识纳入多组学数据分析来应对这些挑战。在包括预测、特征选择、聚类和降维在内的任务中,这些方法已被证明优于那些不利用生物知识的方法。在本综述中,我们调查了最近开发的知识引导多组学数据整合方法及其应用,并讨论了未来的研究方向。