Wan Fangping, Kontogiorgos-Heintz Daphne, de la Fuente-Nunez Cesar
Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania Philadelphia Pennsylvania USA
Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania Philadelphia Pennsylvania USA.
Digit Discov. 2022 Mar 31;1(3):195-208. doi: 10.1039/d1dd00024a. eCollection 2022 Jun 13.
Computers can already be programmed for superhuman pattern recognition of images and text. For machines to discover novel molecules, they must first be trained to sort through the many characteristics of molecules and determine which properties should be retained, suppressed, or enhanced to optimize functions of interest. Machines need to be able to understand, read, write, and eventually create new molecules. Today, this creative process relies on deep generative models, which have gained popularity since powerful deep neural networks were introduced to generative model frameworks. In recent years, they have demonstrated excellent ability to model complex distribution of real-word data (, images, audio, text, molecules, and biological sequences). Deep generative models can generate data beyond those provided in training samples, thus yielding an efficient and rapid tool for exploring the massive search space of high-dimensional data such as DNA/protein sequences and facilitating the design of biomolecules with desired functions. Here, we review the emerging field of deep generative models applied to peptide science. In particular, we discuss several popular deep generative model frameworks as well as their applications to generate peptides with various kinds of properties (, antimicrobial, anticancer, cell penetration, ). We conclude our review with a discussion of current limitations and future perspectives in this emerging field.
计算机已经可以通过编程实现超人般的图像和文本模式识别。要让机器发现新的分子,它们首先必须经过训练,梳理分子的众多特征,并确定哪些特性应该保留、抑制或增强,以优化感兴趣的功能。机器需要能够理解、读取、书写并最终创造新的分子。如今,这一创造性过程依赖于深度生成模型,自从强大的深度神经网络被引入生成模型框架以来,深度生成模型越来越受欢迎。近年来,它们已展现出卓越的能力,能够对真实世界数据(如图像、音频、文本、分子和生物序列)的复杂分布进行建模。深度生成模型可以生成超出训练样本所提供的数据,从而产生一种高效且快速的工具,用于探索高维数据(如DNA/蛋白质序列)的巨大搜索空间,并促进具有所需功能的生物分子的设计。在此,我们综述了应用于肽科学的深度生成模型这一新兴领域。特别地,我们讨论了几种流行的深度生成模型框架,以及它们在生成具有各种特性(如抗菌、抗癌、细胞穿透)的肽方面的应用。我们在综述结尾讨论了这一新兴领域当前的局限性和未来前景。