Cai Yizhi, Hartnett Brian, Gustafsson Claes, Peccoud Jean
Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Washington Street, MC 0477, Blacksburg VA 24061, USA.
Bioinformatics. 2007 Oct 15;23(20):2760-7. doi: 10.1093/bioinformatics/btm446. Epub 2007 Sep 5.
The sequence of artificial genetic constructs is composed of multiple functional fragments, or genetic parts, involved in different molecular steps of gene expression mechanisms. Biologists have deciphered structural rules that the design of genetic constructs needs to follow in order to ensure a successful completion of the gene expression process, but these rules have not been formalized, making it challenging for non-specialists to benefit from the recent progress in gene synthesis.
We show that context-free grammars (CFG) can formalize these design principles. This approach provides a path to organizing libraries of genetic parts according to their biological functions, which correspond to the syntactic categories of the CFG. It also provides a framework for the systematic design of new genetic constructs consistent with the design principles expressed in the CFG. Using parsing algorithms, this syntactic model enables the verification of existing constructs. We illustrate these possibilities by describing a CFG that generates the most common architectures of genetic constructs in Escherichia coli.
A web site allows readers to experiment with the algorithms presented in this article: www.genocad.org.
Sequences and models are available at Bioinformatics online.
人工遗传构建体的序列由多个功能片段或基因元件组成,这些片段或元件参与基因表达机制的不同分子步骤。生物学家已经破译了遗传构建体设计需要遵循的结构规则,以确保基因表达过程的成功完成,但这些规则尚未形式化,这使得非专业人员难以从基因合成的最新进展中受益。
我们表明上下文无关文法(CFG)可以将这些设计原则形式化。这种方法提供了一条根据基因元件的生物学功能来组织基因元件库的途径,这些生物学功能与CFG的句法类别相对应。它还为系统设计与CFG中表达的设计原则一致的新遗传构建体提供了一个框架。使用解析算法,这种句法模型能够验证现有的构建体。我们通过描述一个生成大肠杆菌中最常见遗传构建体架构的CFG来说明这些可能性。
一个网站允许读者试用本文中提出的算法:www.genocad.org。
序列和模型可在《生物信息学》在线获取。