Zhao Cuihuan, Guan Yuying, Yan Shuan, Li Jiahang
Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.
Department of Engineering Physics, Institute of Public Safety Research, Tsinghua University, Beijing 100084, China.
Int J Mol Sci. 2024 Dec 6;25(23):13137. doi: 10.3390/ijms252313137.
Promoters, as core elements in the regulation of gene expression, play a pivotal role in genetic engineering and synthetic biology. The accurate prediction and optimization of promoter strength are essential for advancing these fields. Here, we present the first promoter strength database tailored to , an extremophilic microorganism, and propose a novel promoter design and prediction method based on generative adversarial networks (GANs) and multi-model fusion. The GAN model effectively learns the key features of promoter sequences, such as the GC content and Moran's coefficients, to generate biologically plausible promoter sequences. To enhance prediction accuracy, we developed a multi-model fusion framework integrating deep learning and machine learning approaches. Deep learning models, incorporating BiLSTM and CNN architectures, capture k-mer and PSSM features, whereas machine learning models utilize engineered string and non-string features to construct comprehensive feature matrices for the multidimensional analysis and prediction of promoter strength. Using the proposed framework, newly generated promoters via mutation were predicted, and their functional validity was experimentally confirmed. The integration of multiple models significantly reduced the experimental validation space through an intersection-based strategy, achieving a notable improvement in top quantile prediction accuracy, particularly within the top five quantiles. The robustness and applicability of this model were further validated on diverse datasets, including test sets and out-of-sample promoters. This study not only introduces an innovative approach for promoter design and prediction in but also lays a foundation for advancing industrial biotechnology. Additionally, the proposed strategy of GAN-based generation coupled with multi-model prediction demonstrates versatility, offering a valuable reference for promoter design and strength prediction in other extremophiles. Our findings highlight the promising synergy between artificial intelligence and synthetic biology, underscoring their profound academic and practical implications.
启动子作为基因表达调控的核心元件,在基因工程和合成生物学中起着关键作用。准确预测和优化启动子强度对于推动这些领域的发展至关重要。在此,我们展示了首个针对极端微生物定制的启动子强度数据库,并提出了一种基于生成对抗网络(GAN)和多模型融合的新型启动子设计与预测方法。GAN模型有效地学习了启动子序列的关键特征,如GC含量和莫兰系数,以生成具有生物学合理性的启动子序列。为提高预测准确性,我们开发了一个整合深度学习和机器学习方法的多模型融合框架。深度学习模型结合双向长短期记忆网络(BiLSTM)和卷积神经网络(CNN)架构,捕捉k-mer和位置特异性得分矩阵(PSSM)特征,而机器学习模型利用工程化的字符串和非字符串特征构建综合特征矩阵,用于启动子强度的多维分析和预测。使用所提出的框架,对通过突变新生成的启动子进行了预测,并通过实验证实了它们的功能有效性。多个模型的整合通过基于交集的策略显著减少了实验验证空间,在最高分位数预测准确性方面取得了显著提高,尤其是在前五个分位数内。该模型的稳健性和适用性在包括测试集和样本外启动子在内的不同数据集上得到了进一步验证。本研究不仅为[极端微生物名称]的启动子设计和预测引入了一种创新方法,也为推进工业生物技术奠定了基础。此外,所提出的基于GAN生成与多模型预测的策略展示了通用性,为其他极端微生物的启动子设计和强度预测提供了有价值的参考。我们的研究结果突出了人工智能与合成生物学之间有前景的协同作用,强调了它们深刻的学术和实际意义。