深度学习在蛋白质表达优化中的应用。

Deep learning for optimization of protein expression.

机构信息

School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, UK.

School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JH, UK; School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK; The Alan Turing Institute, London NW1 2DB, UK.

出版信息

Curr Opin Biotechnol. 2023 Jun;81:102941. doi: 10.1016/j.copbio.2023.102941. Epub 2023 Apr 21.

DOI:10.1016/j.copbio.2023.102941

PMID:37087839

Abstract

Advances in high-throughput DNA synthesis and sequencing have fuelled the use of massively parallel reporter assays for strain characterization. These experiments produce large datasets that map DNA sequences to protein expression levels, and have sparked increased interest in data-driven methods for sequence-to-expression modeling. Here, we highlight progress in deep learning models of protein expression and their potential for optimizing strains engineered to produce recombinant proteins. We discuss recent works that built highly accurate models as well as the challenges that hinder wider adoption by end users. There is a need to better align this technology with the requirements and capabilities encountered in strain engineering, particularly the cost of data acquisition and the need for interpretable models that generalize beyond the training data. Overcoming these barriers will help to incentivize academic and industrial laboratories to tap into a new era of data-centric strain engineering.

摘要

高通量 DNA 合成和测序技术的进步推动了大规模平行报告基因检测在菌株表征中的应用。这些实验产生了将 DNA 序列映射到蛋白质表达水平的大型数据集，并且激发了人们对用于序列到表达建模的数据驱动方法的浓厚兴趣。在这里，我们重点介绍蛋白质表达的深度学习模型的进展及其在优化用于生产重组蛋白的工程菌株方面的潜力。我们讨论了构建高精度模型的最新研究工作，以及阻碍最终用户更广泛采用的挑战。需要使这项技术更好地适应菌株工程中遇到的要求和能力，特别是数据获取的成本以及需要可解释的模型，使其能够超越训练数据进行推广。克服这些障碍将有助于激励学术和工业实验室利用数据为中心的菌株工程的新时代。