Department of Physics, University of Oxford, Oxford, UK.
School of Informatics, University of Edinburgh, Edinburgh, UK.
Methods Mol Biol. 2024;2760:345-369. doi: 10.1007/978-1-0716-3658-9_20.
The identification of essential genes is a key challenge in systems and synthetic biology, particularly for engineering metabolic pathways that convert feedstocks into valuable products. Assessment of gene essentiality at a genome scale requires large and costly growth assays of knockout strains. Here we describe a strategy to predict the essentiality of metabolic genes using binary classification algorithms. The approach combines elements from genome-scale metabolic models, directed graphs, and machine learning into a predictive model that can be trained on small knockout data. We demonstrate the efficacy of this approach using the most complete metabolic model of Escherichia coli and various machine learning algorithms for binary classification.
确定必需基因是系统和合成生物学面临的一个关键挑战,特别是在构建将原料转化为有价值产品的代谢途径时。在全基因组范围内评估基因的必需性需要对敲除菌株进行大量昂贵的生长测定。在这里,我们描述了一种使用二进制分类算法预测代谢基因必需性的策略。该方法将基因组尺度代谢模型、有向图和机器学习的元素结合到一个可以在小的敲除数据上进行训练的预测模型中。我们使用最完整的大肠杆菌代谢模型和各种用于二进制分类的机器学习算法来验证该方法的有效性。