Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America.
Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.
PLoS Comput Biol. 2020 Apr 2;16(4):e1007677. doi: 10.1371/journal.pcbi.1007677. eCollection 2020 Apr.
The molecular mechanisms and functions in complex biological systems currently remain elusive. Recent high-throughput techniques, such as next-generation sequencing, have generated a wide variety of multiomics datasets that enable the identification of biological functions and mechanisms via multiple facets. However, integrating these large-scale multiomics data and discovering functional insights are, nevertheless, challenging tasks. To address these challenges, machine learning has been broadly applied to analyze multiomics. This review introduces multiview learning-an emerging machine learning field-and envisions its potentially powerful applications to multiomics. In particular, multiview learning is more effective than previous integrative methods for learning data's heterogeneity and revealing cross-talk patterns. Although it has been applied to various contexts, such as computer vision and speech recognition, multiview learning has not yet been widely applied to biological data-specifically, multiomics data. Therefore, this paper firstly reviews recent multiview learning methods and unifies them in a framework called multiview empirical risk minimization (MV-ERM). We further discuss the potential applications of each method to multiomics, including genomics, transcriptomics, and epigenomics, in an aim to discover the functional and mechanistic interpretations across omics. Secondly, we explore possible applications to different biological systems, including human diseases (e.g., brain disorders and cancers), plants, and single-cell analysis, and discuss both the benefits and caveats of using multiview learning to discover the molecular mechanisms and functions of these systems.
目前,复杂生物系统中的分子机制和功能仍然难以捉摸。最近的高通量技术,如下一代测序,已经产生了各种各样的多组学数据集,这些数据集可以通过多个方面来识别生物功能和机制。然而,整合这些大规模的多组学数据并发现功能见解仍然是具有挑战性的任务。为了解决这些挑战,机器学习已被广泛应用于分析多组学。本文介绍了多视图学习——一个新兴的机器学习领域,并设想了它在多组学中的潜在强大应用。特别是,多视图学习比以前的整合方法更有效地学习数据的异质性并揭示交叉对话模式。尽管它已经应用于计算机视觉和语音识别等各种领域,但多视图学习尚未广泛应用于生物数据,特别是多组学数据。因此,本文首先回顾了最近的多视图学习方法,并将它们统一在一个称为多视图经验风险最小化(MV-ERM)的框架中。我们进一步讨论了每种方法在多组学中的潜在应用,包括基因组学、转录组学和表观基因组学,旨在发现跨组学的功能和机制解释。其次,我们探索了在不同生物系统中的可能应用,包括人类疾病(例如,大脑紊乱和癌症)、植物和单细胞分析,并讨论了使用多视图学习来发现这些系统的分子机制和功能的好处和注意事项。