Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA.
Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
Nat Chem. 2021 Oct;13(10):992-1000. doi: 10.1038/s41557-021-00766-3. Epub 2021 Aug 9.
There are more amino acid permutations within a 40-residue sequence than atoms on Earth. This vast chemical search space hinders the use of human learning to design functional polymers. Here we show how machine learning enables the de novo design of abiotic nuclear-targeting miniproteins to traffic antisense oligomers to the nucleus of cells. We combined high-throughput experimentation with a directed evolution-inspired deep-learning approach in which the molecular structures of natural and unnatural residues are represented as topological fingerprints. The model is able to predict activities beyond the training dataset, and simultaneously deciphers and visualizes sequence-activity predictions. The predicted miniproteins, termed 'Mach', reach an average mass of 10 kDa, are more effective than any previously known variant in cells and can also deliver proteins into the cytosol. The Mach miniproteins are non-toxic and efficiently deliver antisense cargo in mice. These results demonstrate that deep learning can decipher design principles to generate highly active biomolecules that are unlikely to be discovered by empirical approaches.
在一个 40 个残基的序列中,氨基酸的排列组合比地球上的原子还多。这种巨大的化学搜索空间阻碍了人类学习设计功能性聚合物的应用。在这里,我们展示了机器学习如何使非生物核靶向微蛋白的从头设计能够将反义寡核苷酸运送到细胞的核内。我们将高通量实验与受定向进化启发的深度学习方法相结合,其中天然和非天然残基的分子结构表示为拓扑指纹。该模型能够预测超出训练数据集的活性,同时还能解释和可视化序列-活性预测。所预测的微蛋白被称为“Mach”,平均分子量为 10 kDa,在细胞中的活性比以前已知的任何变体都高,并且还可以将蛋白质递送到细胞质中。Mach 微蛋白无毒,能够有效地在小鼠体内输送反义货物。这些结果表明,深度学习可以破译设计原则,生成高活性的生物分子,而这些分子不太可能通过经验方法发现。