Picard Milan, Scott-Boyer Marie-Pier, Bodein Antoine, Périn Olivier, Droit Arnaud
Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada.
Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France.
Comput Struct Biotechnol J. 2021 Jun 22;19:3735-3746. doi: 10.1016/j.csbj.2021.06.030. eCollection 2021.
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.
高通量技术可用性的提高产生了越来越多的组学数据,这些数据试图描绘许多不同但互补的生物层面,包括基因组学、表观基因组学、转录组学、蛋白质组学和代谢组学。通过机器学习算法已经从这些数据中获得了新的见解,这些算法产生了诊断和分类生物标志物。然而,迄今为止获得的大多数生物标志物一次仅包含一种组学测量,因此没有充分利用现在能够捕捉生物系统全部复杂性的最新多组学实验。需要多组学数据整合策略来结合每个组学层面带来的互补知识。我们已将最新的数据整合方法/框架总结为五种不同的整合策略:早期、混合、中期、晚期和分层。在本综述中,我们特别关注机器学习应用,重点探讨挑战和现有的多组学整合策略。