• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生成具有最佳字符串打包的信息密集型启动子序列。

Generating information-dense promoter sequences with optimal string packing.

机构信息

Biomedical Engineering Department, Boston University, Boston, Massachusetts, United States of America.

Biological Design Center, Boston University, Boston, Massachusetts, United States of America.

出版信息

PLoS Comput Biol. 2024 Jul 24;20(7):e1012276. doi: 10.1371/journal.pcbi.1012276. eCollection 2024 Jul.

DOI:10.1371/journal.pcbi.1012276
PMID:39047028
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11268586/
Abstract

Dense arrangements of binding sites within nucleotide sequences can collectively influence downstream transcription rates or initiate biomolecular interactions. For example, natural promoter regions can harbor many overlapping transcription factor binding sites that influence the rate of transcription initiation. Despite the prevalence of overlapping binding sites in nature, rapid design of nucleotide sequences with many overlapping sites remains a challenge. Here, we show that this is an NP-hard problem, coined here as the nucleotide String Packing Problem (SPP). We then introduce a computational technique that efficiently assembles sets of DNA-protein binding sites into dense, contiguous stretches of double-stranded DNA. For the efficient design of nucleotide sequences spanning hundreds of base pairs, we reduce the SPP to an Orienteering Problem with integer distances, and then leverage modern integer linear programming solvers. Our method optimally packs sets of 20-100 binding sites into dense nucleotide arrays of 50-300 base pairs in 0.05-10 seconds. Unlike approximation algorithms or meta-heuristics, our approach finds provably optimal solutions. We demonstrate how our method can generate large sets of diverse sequences suitable for library generation, where the frequency of binding site usage across the returned sequences can be controlled by modulating the objective function. As an example, we then show how adding additional constraints, like the inclusion of sequence elements with fixed positions, allows for the design of bacterial promoters. The nucleotide string packing approach we present can accelerate the design of sequences with complex DNA-protein interactions. When used in combination with synthesis and high-throughput screening, this design strategy could help interrogate how complex binding site arrangements impact either gene expression or biomolecular mechanisms in varied cellular contexts.

摘要

核苷酸序列中结合位点的密集排列可以共同影响下游转录速率或启动生物分子相互作用。例如,天然启动子区域可以包含许多重叠的转录因子结合位点,这些结合位点影响转录起始的速率。尽管重叠结合位点在自然界中很普遍,但设计具有许多重叠位点的核苷酸序列仍然具有挑战性。在这里,我们表明这是一个 NP 难问题,我们称之为核苷酸字符串打包问题(SPP)。然后,我们引入了一种计算技术,该技术可以有效地将 DNA-蛋白质结合位点集组装成密集、连续的双链 DNA 片段。对于跨越数百个碱基对的核苷酸序列的高效设计,我们将 SPP 简化为具有整数距离的定向旅行问题,然后利用现代整数线性规划求解器。我们的方法可以在 0.05-10 秒内将 20-100 个结合位点集高效地打包到 50-300 个碱基对的密集核苷酸阵列中。与近似算法或启发式算法不同,我们的方法可以找到可证明的最优解。我们展示了如何使用我们的方法生成适合文库生成的大量不同序列,通过调节目标函数,可以控制返回序列中结合位点使用的频率。作为一个例子,我们展示了如何添加其他约束条件,例如包含固定位置的序列元素,从而设计细菌启动子。我们提出的核苷酸字符串打包方法可以加速具有复杂 DNA-蛋白质相互作用的序列设计。当与合成和高通量筛选结合使用时,这种设计策略可以帮助研究复杂的结合位点排列如何影响不同细胞环境中的基因表达或生物分子机制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/e12641b32f3b/pcbi.1012276.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/3fd5db91f59b/pcbi.1012276.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/aa3c60eb6830/pcbi.1012276.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/59911ff23d36/pcbi.1012276.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/21496799195f/pcbi.1012276.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/90ca07ab13bd/pcbi.1012276.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/e12641b32f3b/pcbi.1012276.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/3fd5db91f59b/pcbi.1012276.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/aa3c60eb6830/pcbi.1012276.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/59911ff23d36/pcbi.1012276.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/21496799195f/pcbi.1012276.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/90ca07ab13bd/pcbi.1012276.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a98/11268586/e12641b32f3b/pcbi.1012276.g006.jpg

相似文献

1
Generating information-dense promoter sequences with optimal string packing.生成具有最佳字符串打包的信息密集型启动子序列。
PLoS Comput Biol. 2024 Jul 24;20(7):e1012276. doi: 10.1371/journal.pcbi.1012276. eCollection 2024 Jul.
2
Generating information-dense promoter sequences with optimal string packing.生成具有最佳字符串包装的信息密集型启动子序列。
bioRxiv. 2024 Feb 2:2023.11.01.565124. doi: 10.1101/2023.11.01.565124.
3
Binding site graphs: a new graph theoretical framework for prediction of transcription factor binding sites.结合位点图:一种预测转录因子结合位点的新图论框架。
PLoS Comput Biol. 2007 May;3(5):e90. doi: 10.1371/journal.pcbi.0030090. Epub 2007 Apr 10.
4
A survey of DNA motif finding algorithms.DNA基序查找算法综述。
BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S21. doi: 10.1186/1471-2105-8-S7-S21.
5
Characterization of the human p11 promoter sequence.人类p11启动子序列的特征分析。
Gene. 2003 May 22;310:133-42. doi: 10.1016/s0378-1119(03)00529-8.
6
An intuitionistic approach to scoring DNA sequences against transcription factor binding site motifs.一种基于直觉的方法,用于对 DNA 序列进行评分,以对抗转录因子结合位点基序。
BMC Bioinformatics. 2010 Nov 8;11:551. doi: 10.1186/1471-2105-11-551.
7
Two different classes of co-occurring motif pairs found by a novel visualization method in human promoter regions.通过一种新颖的可视化方法在人类启动子区域发现的两类不同的共现基序对。
BMC Genomics. 2008 Mar 1;9:112. doi: 10.1186/1471-2164-9-112.
8
An improved bind-n-seq strategy to determine protein-DNA interactions validated using the bacterial transcriptional regulator YipR.一种改进的结合-测序策略,用于确定蛋白质-DNA 相互作用,该策略已通过细菌转录调节剂 YipR 得到验证。
BMC Microbiol. 2020 Jan 2;20(1):1. doi: 10.1186/s12866-019-1672-7.
9
Negative regulation of the androgen receptor gene promoter by NFI and an adjacently located multiprotein-binding site.NFI和相邻的多蛋白结合位点对雄激素受体基因启动子的负调控。
Mol Endocrinol. 1999 Sep;13(9):1487-96. doi: 10.1210/mend.13.9.0350.
10
Finding Possible Promoter Binding Sites in DNA Sequences by Sequential Patterns Mining With Specific Numbers of Gaps.通过具有特定缺口数的序列模式挖掘在 DNA 序列中寻找可能的启动子结合位点。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2459-2470. doi: 10.1109/TCBB.2020.2980234. Epub 2021 Dec 8.

引用本文的文献

1
Engineered Transcription Factor Binding Arrays for DNA-based Gene Expression Control in Mammalian Cells.用于哺乳动物细胞中基于DNA的基因表达控制的工程转录因子结合阵列
bioRxiv. 2024 Sep 3:2024.09.03.610999. doi: 10.1101/2024.09.03.610999.

本文引用的文献

1
Diffusion-Based Generative Network for de Novo Synthetic Promoter Design.用于从头合成启动子设计的基于扩散的生成网络
ACS Synth Biol. 2024 May 17;13(5):1513-1522. doi: 10.1021/acssynbio.4c00041. Epub 2024 Apr 13.
2
GPro: generative AI-empowered toolkit for promoter design.GPro:基于生成式 AI 的启动子设计工具包。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae123.
3
The continuum of transcription factor affinities.转录因子亲和力的连续统。
Nat Rev Genet. 2024 Jun;25(6):378. doi: 10.1038/s41576-024-00713-1.
4
Generative models for protein structures and sequences.蛋白质结构与序列的生成模型。
Nat Biotechnol. 2024 Feb;42(2):196-199. doi: 10.1038/s41587-023-02115-w.
5
Hold out the genome: a roadmap to solving the cis-regulatory code.伸出基因组:解决顺式调控代码的路线图。
Nature. 2024 Jan;625(7993):41-50. doi: 10.1038/s41586-023-06661-w. Epub 2023 Dec 13.
6
Deep flanking sequence engineering for efficient promoter design using DeepSEED.使用 DeepSEED 进行高效启动子设计的深侧翼序列工程。
Nat Commun. 2023 Oct 9;14(1):6309. doi: 10.1038/s41467-023-41899-y.
7
The architecture of binding cooperativity between densely bound transcription factors.密集结合的转录因子之间结合协同作用的结构。
Cell Syst. 2023 Sep 20;14(9):732-745.e5. doi: 10.1016/j.cels.2023.06.010. Epub 2023 Jul 31.
8
Design of synthetic promoters for cyanobacteria with generative deep-learning model.基于生成式深度学习模型的蓝藻合成启动子设计。
Nucleic Acids Res. 2023 Jul 21;51(13):7071-7082. doi: 10.1093/nar/gkad451.
9
Deep learning for optimization of protein expression.深度学习在蛋白质表达优化中的应用。
Curr Opin Biotechnol. 2023 Jun;81:102941. doi: 10.1016/j.copbio.2023.102941. Epub 2023 Apr 21.
10
Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria.自动化模型预测设计合成启动子,以控制细菌中的转录谱。
Nat Commun. 2022 Sep 2;13(1):5159. doi: 10.1038/s41467-022-32829-5.