Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.
Center for Computational Biology, University of California, Berkeley, CA, USA.
Mol Syst Biol. 2020 Sep;16(9):e9198. doi: 10.15252/msb.20199198.
Generative models provide a well-established statistical framework for evaluating uncertainty and deriving conclusions from large data sets especially in the presence of noise, sparsity, and bias. Initially developed for computer vision and natural language processing, these models have been shown to effectively summarize the complexity that underlies many types of data and enable a range of applications including supervised learning tasks, such as assigning labels to images; unsupervised learning tasks, such as dimensionality reduction; and out-of-sample generation, such as de novo image synthesis. With this early success, the power of generative models is now being increasingly leveraged in molecular biology, with applications ranging from designing new molecules with properties of interest to identifying deleterious mutations in our genomes and to dissecting transcriptional variability between single cells. In this review, we provide a brief overview of the technical notions behind generative models and their implementation with deep learning techniques. We then describe several different ways in which these models can be utilized in practice, using several recent applications in molecular biology as examples.
生成模型为评估不确定性和从大数据集中得出结论提供了一个成熟的统计框架,特别是在存在噪声、稀疏性和偏差的情况下。这些模型最初是为计算机视觉和自然语言处理开发的,已被证明能够有效地总结许多类型数据背后的复杂性,并能够实现一系列应用,包括监督学习任务,如为图像分配标签;无监督学习任务,如降维;以及样本外生成,如从头合成图像。随着早期的成功,生成模型的功能现在正越来越多地在分子生物学中得到利用,其应用范围从设计具有感兴趣性质的新分子到识别我们基因组中的有害突变,再到剖析单细胞之间的转录变异性。在这篇综述中,我们简要概述了生成模型背后的技术概念及其与深度学习技术的实现。然后,我们描述了这些模型在实践中可以采用的几种不同方式,并以分子生物学中的几个最近应用为例。