EVMP：通过扩展视觉突变体优先级框架增强用于合成启动子强度预测的机器学习模型。

EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework.

作者信息

Yang Weiqin, Li Dexin, Huang Ranran

机构信息

Institute of Marine Science and Technology, Shandong University, Qingdao, China.

School of Computer Science and Technology, Shandong University, Qingdao, China.

出版信息

Front Microbiol. 2023 Jul 5;14:1215609. doi: 10.3389/fmicb.2023.1215609. eCollection 2023.

DOI:10.3389/fmicb.2023.1215609

PMID:37476664

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10354429/

Abstract

INTRODUCTION

In metabolic engineering and synthetic biology applications, promoters with appropriate strengths are critical. However, it is time-consuming and laborious to annotate promoter strength by experiments. Nowadays, constructing mutation-based synthetic promoter libraries that span multiple orders of magnitude of promoter strength is receiving increasing attention. A number of machine learning (ML) methods are applied to synthetic promoter strength prediction, but existing models are limited by the excessive proximity between synthetic promoters.

METHODS

In order to enhance ML models to better predict the synthetic promoter strength, we propose EVMP(Extended Vision Mutant Priority), a universal framework which utilize mutation information more effectively. In EVMP, synthetic promoters are equivalently transformed into base promoter and corresponding -mer mutations, which are input into BaseEncoder and VarEncoder, respectively. EVMP also provides optional data augmentation, which generates multiple copies of the data by selecting different base promoters for the same synthetic promoter.

RESULTS

In Trc synthetic promoter library, EVMP was applied to multiple ML models and the model effect was enhanced to varying extents, up to 61.30% (MAE), while the SOTA(state-of-the-art) record was improved by 15.25% (MAE) and 4.03% (). Data augmentation based on multiple base promoters further improved the model performance by 17.95% (MAE) and 7.25% () compared with non-EVMP SOTA record.

DISCUSSION

In further study, extended vision (or -mer) is shown to be essential for EVMP. We also found that EVMP can alleviate the over-smoothing phenomenon, which may contributes to its effectiveness. Our work suggests that EVMP can highlight the mutation information of synthetic promoters and significantly improve the prediction accuracy of strength. The source code is publicly available on GitHub: https://github.com/Tiny-Snow/EVMP.

摘要

引言

在代谢工程和合成生物学应用中，具有适当强度的启动子至关重要。然而，通过实验注释启动子强度既耗时又费力。如今，构建跨越多个启动子强度数量级的基于突变的合成启动子文库受到越来越多的关注。许多机器学习（ML）方法被应用于合成启动子强度预测，但现有模型受到合成启动子之间过度接近的限制。

方法

为了增强ML模型以更好地预测合成启动子强度，我们提出了EVMP（扩展视觉突变优先级），这是一个更有效地利用突变信息的通用框架。在EVMP中，合成启动子被等效地转换为基础启动子和相应的-mer突变，分别输入到BaseEncoder和VarEncoder中。EVMP还提供了可选的数据增强，通过为同一个合成启动子选择不同的基础启动子来生成多个数据副本。

结果

在Trc合成启动子文库中，EVMP被应用于多个ML模型，模型效果在不同程度上得到了增强，最高可达61.30%（平均绝对误差），而最先进（SOTA）记录提高了15.25%（平均绝对误差）和4.03%（）。与非EVMP的SOTA记录相比，基于多个基础启动子的数据增强进一步将模型性能提高了17.95%（平均绝对误差）和7.25%（）。

讨论

在进一步的研究中，扩展视觉（或-mer）被证明对EVMP至关重要。我们还发现EVMP可以缓解过平滑现象，这可能有助于其有效性。我们的工作表明，EVMP可以突出合成启动子的突变信息，并显著提高强度预测的准确性。源代码可在GitHub上公开获取：https://github.com/Tiny-Snow/EVMP 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f92/10354429/4eb59ff5d6aa/fmicb-14-1215609-g0001.jpg

相似文献

EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework.

Front Microbiol. 2023 Jul 5;14:1215609. doi: 10.3389/fmicb.2023.1215609. eCollection 2023.

Precise Prediction of Promoter Strength Based on a De Novo Synthetic Promoter Library Coupled with Machine Learning.

ACS Synth Biol. 2022 Jan 21;11(1):92-102. doi: 10.1021/acssynbio.1c00117. Epub 2021 Dec 19.

CAPE: a deep learning framework with Chaos-Attention net for Promoter Evolution.

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae398.

Development of a novel strategy for robust synthetic bacterial promoters based on a stepwise evolution targeting the spacer region of the core promoter in Bacillus subtilis.

Microb Cell Fact. 2019 May 29;18(1):96. doi: 10.1186/s12934-019-1148-3.

Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach.

BMC Bioinformatics. 2008 Oct 4;9:414. doi: 10.1186/1471-2105-9-414.

PromoterPredict: sequence-based modelling of σ promoter strength yields logarithmic dependence between promoter strength and sequence.

PeerJ. 2018 Nov 7;6:e5862. doi: 10.7717/peerj.5862. eCollection 2018.

iPro2L-DG: Hybrid network based on improved densenet and global attention mechanism for identifying promoter sequences.

Heliyon. 2024 Mar 6;10(6):e27364. doi: 10.1016/j.heliyon.2024.e27364. eCollection 2024 Mar 30.

Prediction and characterization of promoters and ribosomal binding sites of in system biology era.

Biotechnol Biofuels. 2019 Mar 14;12:52. doi: 10.1186/s13068-019-1399-6. eCollection 2019.

Machine perfusion of circulatory determined death hearts: A scoping review.

Transplant Rev (Orlando). 2020 Jul;34(3):100551. doi: 10.1016/j.trre.2020.100551. Epub 2020 May 12.

Predicting Promoters in Multiple Prokaryotes with Prompt.

Interdiscip Sci. 2024 Dec;16(4):814-828. doi: 10.1007/s12539-024-00637-8. Epub 2024 Aug 7.

引用本文的文献

Deep learning guided programmable design of Escherichia coli core promoters from sequence architecture to strength control.

Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf863.

Artificial Intelligence-Based Approaches for AAV Vector Engineering.

Adv Sci (Weinh). 2025 Mar;12(9):e2411062. doi: 10.1002/advs.202411062. Epub 2025 Feb 11.

Exploring the Promoter Generation and Prediction of spp. Based on GAN and Multi-Model Fusion Methods.

Int J Mol Sci. 2024 Dec 6;25(23):13137. doi: 10.3390/ijms252313137.

CAPE: a deep learning framework with Chaos-Attention net for Promoter Evolution.

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae398.

Fine-Tuning Gene Expression in Bacteria by Synthetic Promoters.

Methods Mol Biol. 2024;2844:179-195. doi: 10.1007/978-1-0716-4063-0_12.

本文引用的文献

Precise Prediction of Promoter Strength Based on a De Novo Synthetic Promoter Library Coupled with Machine Learning.

ACS Synth Biol. 2022 Jan 21;11(1):92-102. doi: 10.1021/acssynbio.1c00117. Epub 2021 Dec 19.

Advances in promoter engineering: Novel applications and predefined transcriptional control.

Biotechnol J. 2021 Oct;16(10):e2100239. doi: 10.1002/biot.202100239. Epub 2021 Aug 22.

Development of synthetic biology tools to engineer as a chassis for the production of natural products.

Synth Syst Biotechnol. 2021 May 3;6(2):110-119. doi: 10.1016/j.synbio.2021.04.005. eCollection 2021 Jun.

Automated download and clean-up of family-specific databases for kmer-based virus identification.

Bioinformatics. 2021 May 5;37(5):705-710. doi: 10.1093/bioinformatics/btaa857.

Promoter Architecture and Promoter Engineering in .

Metabolites. 2020 Aug 6;10(8):320. doi: 10.3390/metabo10080320.

Model-driven generation of artificial yeast promoters.

Nat Commun. 2020 Apr 30;11(1):2113. doi: 10.1038/s41467-020-15977-4.

Deciphering eukaryotic gene-regulatory logic with 100 million random promoters.

Nat Biotechnol. 2020 Jan;38(1):56-65. doi: 10.1038/s41587-019-0315-8. Epub 2019 Dec 2.

Designing Eukaryotic Gene Expression Regulation Using Machine Learning.

Trends Biotechnol. 2020 Feb;38(2):191-201. doi: 10.1016/j.tibtech.2019.07.007. Epub 2019 Aug 17.

A high-throughput screening and computation platform for identifying synthetic promoters with enhanced cell-state specificity (SPECS).

Nat Commun. 2019 Jun 28;10(1):2880. doi: 10.1038/s41467-019-10912-8.

Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences.

Genome Res. 2017 Dec;27(12):2015-2024. doi: 10.1101/gr.224964.117. Epub 2017 Nov 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

EVMP：通过扩展视觉突变体优先级框架增强用于合成启动子强度预测的机器学习模型。

EVMP: enhancing machine learning models for synthetic promoter strength prediction by Extended Vision Mutant Priority framework.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献