Buck Kristan K S, Subramanian Venkatanarayanan, Block David E
Department of Chemical Engineering and Materials Science, and Department of Viticulture and Enology, University of California, Davis, One Shields Avenue, Davis, California 95616.
Biotechnol Prog. 2002 Nov-Dec;18(6):1366-76. doi: 10.1021/bp020112p.
To develop a useful fermentation process model, it is first necessary to identify which batch operating parameters are critical in determining the process outcome. To identify critical processing inputs in large databases, we have explored the use of Decision Tree Analysis with the decision metrics of Gain (i.e., Shannon Entropy changes), Gain Ratio, and a multiple hypergeometric distribution. The usefulness of this approach lies in its ability to treat "categorical" variables, which are typical of archived fermentation databases, as well as "continuous" variables. In this work, we demonstrate the use of Decision Tree Analysis for the problem of optimizing recombinant green fluorescent protein production in E. coli. A database of 85 fermentations was generated to examine the effect of 15 process input parameters on final biomass yield, maximum recombinant protein concentration, and productivity. The use of Decision Tree Analysis led to a considerable reduction in the fermentation database through the identification of the significant as well as insignificant inputs. However, different decision metrics selected different inputs and different numbers of inputs to classify the data for each output.
要开发一个有用的发酵过程模型,首先需要确定哪些批次操作参数对确定过程结果至关重要。为了在大型数据库中识别关键加工输入,我们探索了使用决策树分析,其决策指标包括增益(即香农熵变化)、增益比和多重超几何分布。这种方法的有用性在于它能够处理“分类”变量(这在存档的发酵数据库中很典型)以及“连续”变量。在这项工作中,我们展示了决策树分析在优化大肠杆菌中重组绿色荧光蛋白生产问题上的应用。生成了一个包含85次发酵的数据库,以研究15个过程输入参数对最终生物量产量、最大重组蛋白浓度和生产率的影响。决策树分析的使用通过识别重要和不重要的输入,使发酵数据库得到了大幅精简。然而,不同的决策指标选择了不同的输入以及不同数量的输入来对每个输出的数据进行分类。