Wen Xiao, Lin Jiawei, Yang Chunhe, Li Ying, Cheng Haijiao, Liu Ye, Zhang Yue, Ma Hongwu, Mao Yufeng, Liao Xiaoping, Wang Meng
School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230026, China.
Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China.
Synth Syst Biotechnol. 2024 May 17;9(4):647-657. doi: 10.1016/j.synbio.2024.05.010. eCollection 2024 Dec.
Utilizing standardized artificial regulatory sequences to fine-tuning the expression of multiple metabolic pathways/genes is a key strategy in the creation of efficient microbial cell factories. However, when regulatory sequence expression strengths are characterized using only a few reporter genes, they may not be applicable across diverse genes. This introduces great uncertainty into the precise regulation of multiple genes at multiple expression levels. To address this, our study adopted a fluorescent protein fusion strategy for a more accurate assessment of target protein expression levels. We combined 41 commonly-used metabolic genes with 15 regulatory sequences, yielding an expression dataset encompassing 520 unique combinations. This dataset highlighted substantial variation in protein expression level under identical regulatory sequences, with relative expression levels ranging from 2.8 to 176-fold. It also demonstrated that improving the strength of regulatory sequences does not necessarily lead to significant improvements in the expression levels of target proteins. Utilizing this dataset, we have developed various machine learning models and discovered that the integration of promoter regions, ribosome binding sites, and coding sequences significantly improves the accuracy of predicting protein expression levels, with a Spearman correlation coefficient of 0.72, where the promoter sequence exerts a predominant influence. Our study aims not only to provide a detailed guide for fine-tuning gene expression in the metabolic engineering of but also to deepen our understanding of the compatibility issues between regulatory sequences and target genes.
利用标准化的人工调控序列来微调多个代谢途径/基因的表达是创建高效微生物细胞工厂的关键策略。然而,当仅使用少数报告基因来表征调控序列的表达强度时,它们可能不适用于不同的基因。这给在多个表达水平上对多个基因进行精确调控带来了很大的不确定性。为了解决这个问题,我们的研究采用了荧光蛋白融合策略来更准确地评估目标蛋白的表达水平。我们将41个常用的代谢基因与15个调控序列相结合,生成了一个包含520个独特组合的表达数据集。该数据集突出了在相同调控序列下蛋白质表达水平的显著差异,相对表达水平范围为2.8至176倍。它还表明,提高调控序列的强度并不一定能显著提高目标蛋白的表达水平。利用这个数据集,我们开发了各种机器学习模型,并发现启动子区域、核糖体结合位点和编码序列的整合显著提高了预测蛋白质表达水平的准确性,斯皮尔曼相关系数为0.72,其中启动子序列发挥着主要影响。我们的研究不仅旨在为代谢工程中基因表达的微调提供详细指导,还旨在加深我们对调控序列与目标基因之间兼容性问题的理解。