Department of Chemical and Biological Engineering, University of Wisconsin - Madison, Madison, Wisconsin, United States of America.
DOE Great Lakes Bioenergy Research Center, Univ. of Wisconsin-Madison - Madison, Madison, Wisconsin, United States of America.
PLoS Comput Biol. 2020 Aug 17;16(8):e1008137. doi: 10.1371/journal.pcbi.1008137. eCollection 2020 Aug.
Genome-scale metabolic models have been utilized extensively in the study and engineering of the organisms they describe. Here we present the analysis of a published dataset from pooled transposon mutant fitness experiments as an approach for improving the accuracy and gene-reaction associations of a metabolic model for Zymomonas mobilis ZM4, an industrially relevant ethanologenic organism with extremely high glycolytic flux and low biomass yield. Gene essentiality predictions made by the draft model were compared to data from individual pooled mutant experiments to identify areas of the model requiring deeper validation. Subsequent experiments showed that some of the discrepancies between the model and dataset were caused by polar effects, mis-mapped barcodes, or mutants carrying both wild-type and transposon disrupted gene copies-highlighting potential limitations inherent to data from individual mutants in these high-throughput datasets. Therefore, we analyzed correlations in fitness scores across all 492 experiments in the dataset in the context of functionally related metabolic reaction modules identified within the model via flux coupling analysis. These correlations were used to identify candidate genes for a reaction in histidine biosynthesis lacking an annotated gene and highlight metabolic modules with poorly correlated gene fitness scores. Additional genes for reactions involved in biotin, ubiquinone, and pyridoxine biosynthesis in Z. mobilis were identified and confirmed using mutant complementation experiments. These discovered genes, were incorporated into the final model, iZM4_478, which contains 747 metabolic and transport reactions (of which 612 have gene-protein-reaction associations), 478 genes, and 616 unique metabolites, making it one of the most complete models of Z. mobilis ZM4 to date. The methods of analysis that we applied here with the Z. mobilis transposon mutant dataset, could easily be utilized to improve future genome-scale metabolic reconstructions for organisms where these, or similar, high-throughput datasets are available.
基因组规模的代谢模型已被广泛应用于描述它们的生物的研究和工程中。在这里,我们介绍了对一个已发表的转座子突变体适应度实验数据集的分析,这是一种改进运动发酵单胞菌 ZM4 代谢模型准确性和基因-反应关联的方法,运动发酵单胞菌 ZM4 是一种具有极高糖酵解通量和低生物量产率的工业相关乙醇生产菌。通过比较草案模型的基因必需性预测与来自单个 pooled 突变体实验的数据,确定了模型需要进一步验证的区域。随后的实验表明,模型与数据集之间的一些差异是由极性效应、错误映射的条形码或携带野生型和转座子破坏基因拷贝的突变体引起的,这突出了这些高通量数据集中单个突变体数据固有的潜在局限性。因此,我们在通量耦联分析中通过模型中识别的功能相关代谢反应模块的背景下,分析了数据集中所有 492 个实验的适应度得分之间的相关性。这些相关性用于确定在组氨酸生物合成中缺乏注释基因的反应的候选基因,并突出代谢模块中基因适应度得分相关性较差的部分。使用突变体互补实验进一步鉴定并证实了运动发酵单胞菌生物素、泛醌和吡哆醇生物合成中涉及的反应的其他基因。这些发现的基因被纳入最终模型 iZM4_478 中,该模型包含 747 个代谢和运输反应(其中 612 个具有基因-蛋白-反应关联)、478 个基因和 616 个独特代谢物,使其成为迄今为止最完整的运动发酵单胞菌 ZM4 模型之一。我们在这里应用于运动发酵单胞菌转座子突变体数据集的分析方法,可以轻松地用于改进未来具有这些或类似高通量数据集的生物的基因组规模代谢重建。