CoLiDe：用于探测蛋白质序列空间的组合文库设计工具。

CoLiDe: Combinatorial Library Design tool for probing protein sequence space.

机构信息

Department of Cell Biology, Faculty of Science, Charles University, Biocev, Prague, Czech Republic.

Department of Biochemistry, Faculty of Science, Charles University, 128 00 Prague 2, Czech Republic.

出版信息

Bioinformatics. 2021 May 1;37(4):482-489. doi: 10.1093/bioinformatics/btaa804.

DOI:10.1093/bioinformatics/btaa804

PMID:32956450

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8088326/

Abstract

MOTIVATION

Current techniques of protein engineering focus mostly on re-designing small targeted regions or defined structural scaffolds rather than constructing combinatorial libraries of versatile compositions and lengths. This is a missed opportunity because combinatorial libraries are emerging as a vital source of novel functional proteins and are of interest in diverse research areas.

RESULTS

Here, we present a computational tool for Combinatorial Library Design (CoLiDe) offering precise control over protein sequence composition, length and diversity. The algorithm uses evolutionary approach to provide solutions to combinatorial libraries of degenerate DNA templates. We demonstrate its performance and precision using four different input alphabet distribution on different sequence lengths. In addition, a model design and experimental pipeline for protein library expression and purification is presented, providing a proof-of-concept that our protocol can be used to prepare purified protein library samples of up to 1011-1012 unique sequences. CoLiDe presents a composition-centric approach to protein design towards different functional phenomena.

AVAILABILITYAND IMPLEMENTATION

CoLiDe is implemented in Python and freely available at https://github.com/voracva1/CoLiDe.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

目前的蛋白质工程技术主要集中在重新设计小的靶向区域或定义的结构支架上，而不是构建组合文库，这些文库具有多种组成和长度。这是一个错失的机会，因为组合文库正在成为新型功能蛋白的重要来源，并且在不同的研究领域都有兴趣。

结果

在这里，我们提出了一种用于组合文库设计（CoLiDe）的计算工具，它可以精确控制蛋白质序列的组成、长度和多样性。该算法使用进化方法为简并 DNA 模板的组合文库提供解决方案。我们使用不同的序列长度和 4 种不同的输入字母分布来演示其性能和精度。此外，还提出了一种用于蛋白质文库表达和纯化的模型设计和实验方案，证明了我们的方案可以用于制备多达 1011-1012 个独特序列的纯化蛋白质文库样品。CoLiDe 提出了一种以组合为中心的蛋白质设计方法，用于不同的功能现象。

可用性和实现

CoLiDe 是用 Python 实现的，可以在 https://github.com/voracva1/CoLiDe 上免费获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b536/8088326/ef37e6e43db6/btaa804f1.jpg

相似文献

CoLiDe: Combinatorial Library Design tool for probing protein sequence space.CoLiDe：用于探测蛋白质序列空间的组合文库设计工具。

Bioinformatics. 2021 May 1;37(4):482-489. doi: 10.1093/bioinformatics/btaa804.

Automated design of degenerate codon libraries.简并密码子文库的自动化设计。

Protein Eng Des Sel. 2005 Dec;18(12):559-61. doi: 10.1093/protein/gzi061. Epub 2005 Oct 20.

DeCoDe: degenerate codon design for complete protein-coding DNA libraries.DeCoDe：用于完整编码蛋白质 DNA 文库的简并密码子设计。

Bioinformatics. 2020 Jun 1;36(11):3357-3364. doi: 10.1093/bioinformatics/btaa162.

DeCOIL: Optimization of Degenerate Codon Libraries for Machine Learning-Assisted Protein Engineering.DeCOIL：用于机器学习辅助蛋白质工程的简并密码子文库优化。

ACS Synth Biol. 2023 Aug 18;12(8):2444-2454. doi: 10.1021/acssynbio.3c00301. Epub 2023 Jul 31.

STracking: a free and open-source Python library for particle tracking and analysis.STracking：一个免费的开源 Python 库，用于粒子跟踪和分析。

Bioinformatics. 2022 Jul 11;38(14):3671-3673. doi: 10.1093/bioinformatics/btac365.

DNA Chisel, a versatile sequence optimizer.DNA 钻，一种通用的序列优化器。

Bioinformatics. 2020 Aug 15;36(16):4508-4509. doi: 10.1093/bioinformatics/btaa558.

Optimization of combinatorial mutagenesis.组合诱变的优化

J Comput Biol. 2011 Nov;18(11):1743-56. doi: 10.1089/cmb.2011.0152. Epub 2011 Sep 16.

pystablemotifs: Python library for attractor identification and control in Boolean networks.pystablemotifs：用于布尔网络吸引子识别和控制的 Python 库。

Bioinformatics. 2022 Feb 7;38(5):1465-1466. doi: 10.1093/bioinformatics/btab825.

DNA Features Viewer: a sequence annotation formatting and plotting library for Python.DNA 特征查看器：一个用于 Python 的序列注释格式化和绘图库。

Bioinformatics. 2020 Aug 1;36(15):4350-4352. doi: 10.1093/bioinformatics/btaa213.

A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules.一种通过部分随机模块的分层组装进行基因文库组合化学合成的强大且通用的方法。

PLoS One. 2015 Sep 10;10(9):e0136778. doi: 10.1371/journal.pone.0136778. eCollection 2015.

引用本文的文献

Building the SynBio community in the Czech Republic from the bottom up: You get what you give.自下而上在捷克共和国建立合成生物学社区：付出就有收获。

Biotechnol Notes. 2022 Dec 9;3:124-134. doi: 10.1016/j.biotno.2022.11.002. eCollection 2022.

GGAssembler: Precise and economical design and synthesis of combinatorial mutation libraries.GG组装器：组合突变文库的精确且经济的设计与合成

Protein Sci. 2024 Oct;33(10):e5169. doi: 10.1002/pro.5169.

Sequencing the origins of life.探寻生命的起源。

BBA Adv. 2022 Mar 5;2:100049. doi: 10.1016/j.bbadva.2022.100049. eCollection 2022.

Modern and prebiotic amino acids support distinct structural profiles in proteins.现代和益生菌氨基酸支持蛋白质中不同的结构特征。

Open Biol. 2022 Jun;12(6):220040. doi: 10.1098/rsob.220040. Epub 2022 Jun 22.

本文引用的文献

Sequence Versus Composition: What Prescribes IDP Biophysical Properties?序列与组成：是什么决定了 intrinsically disordered protein（IDP，内在无序蛋白）的生物物理性质？

Entropy (Basel). 2019 Jul 3;21(7):654. doi: 10.3390/e21070654.

DeCoDe: degenerate codon design for complete protein-coding DNA libraries.DeCoDe：用于完整编码蛋白质 DNA 文库的简并密码子设计。

Bioinformatics. 2020 Jun 1;36(11):3357-3364. doi: 10.1093/bioinformatics/btaa162.

Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds.简化的前生物氨基酸字母表最优地编码了不同现存蛋白质折叠的构象空间。

BMC Evol Biol. 2019 Jul 30;19(1):158. doi: 10.1186/s12862-019-1464-6.

Protein synthesis rates and ribosome occupancies reveal determinants of translation elongation rates.蛋白质合成速率和核糖体占有率揭示了翻译延伸速率的决定因素。

Proc Natl Acad Sci U S A. 2019 Jul 23;116(30):15023-15032. doi: 10.1073/pnas.1817299116. Epub 2019 Jul 10.

Becoming a de novo gene.成为一个从头起源的基因。

Nat Ecol Evol. 2019 Apr;3(4):524-525. doi: 10.1038/s41559-019-0845-y.

Genetic Code Evolution Investigated through the Synthesis and Characterisation of Proteins from Reduced-Alphabet Libraries.通过合成和表征简化字母库中的蛋白质来研究遗传密码进化。

Chembiochem. 2019 Mar 15;20(6):846-856. doi: 10.1002/cbic.201800668. Epub 2019 Feb 15.

A Molecular Grammar Governing the Driving Forces for Phase Separation of Prion-like RNA Binding Proteins.一种分子语法，用于控制朊病毒样 RNA 结合蛋白相分离的驱动力。

Cell. 2018 Jul 26;174(3):688-699.e16. doi: 10.1016/j.cell.2018.06.006. Epub 2018 Jun 28.

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.Galaxy 平台：用于可访问、可重复和协作的生物医学分析：2018 年更新。

Nucleic Acids Res. 2018 Jul 2;46(W1):W537-W544. doi: 10.1093/nar/gky379.

High-throughput discovery of functional disordered regions: investigation of transactivation domains.高通量发现功能紊乱区域：转录激活结构域研究。

Mol Syst Biol. 2018 May 14;14(5):e8190. doi: 10.15252/msb.20188190.

Analysis of Evolutionarily Independent Protein-RNA Complexes Yields a Criterion to Evaluate the Relevance of Prebiotic Scenarios.分析进化上独立的蛋白质-RNA 复合物，为评估前生物场景的相关性提供了一个标准。

Curr Biol. 2018 Feb 19;28(4):526-537.e5. doi: 10.1016/j.cub.2018.01.014. Epub 2018 Feb 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CoLiDe：用于探测蛋白质序列空间的组合文库设计工具。

CoLiDe: Combinatorial Library Design tool for probing protein sequence space.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITYAND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献