EA2106 Biomolécules et Biotechnologies Végétales, Université de Tours, Tours, France.
Zenika, Bordeaux, France.
Methods Mol Biol. 2022;2505:131-140. doi: 10.1007/978-1-0716-2349-7_10.
Elucidation of biological pathways leading to specialized metabolites remains a complex task. It is however a mandatory step to allow bioproduction into heterologous hosts. Many steps have already been identified using conventional approaches, enlarging the space of known possible chemical steps. In the recent past years, identification of missing steps has been fueled by the generation of genomic and transcriptomic data for nonmodel species. The analysis of gene expression profiles has revealed that in many cases, genes encoding enzymes involved in the same biosynthetic pathways are coexpressed across different tissue types and environmental conditions. Hence, coexpressed studies, either in the form of differential gene expression, gene coexpression network, or unsupervised clustering methods, have helped deciphering missing steps to complete knowledge on biosynthetic pathways. Already identified biosynthetic steps can be used as baits to capture the remaining unknown steps. The present protocol shows how supervised machine learning in the form of artificial neural networks (ANNs) can efficiently classify genes as specialized metabolism related or not according to their expression levels. Using Catharanthus roseus as an example, we show that ANN trained on a minimal set of bait genes results in many true positives (correctly predicted genes) while keeping false positives low (containing possible candidate genes).
阐明导致特殊代谢物的生物途径仍然是一项复杂的任务。然而,这是将生物生产引入异源宿主的必要步骤。许多步骤已经使用常规方法确定,扩大了已知可能的化学步骤的空间。在过去的几年中,通过为非模式物种生成基因组和转录组数据,鉴定缺失步骤的工作得到了推动。基因表达谱的分析表明,在许多情况下,编码参与同一生物合成途径的酶的基因在不同的组织类型和环境条件下共同表达。因此,共同表达研究,无论是在差异基因表达、基因共表达网络还是无监督聚类方法的形式下,都有助于阐明缺失步骤,以完成生物合成途径的知识。已经确定的生物合成步骤可以用作诱饵来捕获其余未知的步骤。本方案展示了如何使用人工神经网络 (ANN) 形式的监督机器学习根据基因的表达水平将其有效地分类为与特殊代谢相关或不相关。以长春花为例,我们表明,根据一组最小的诱饵基因训练的 ANN 会产生许多真阳性(正确预测的基因),同时保持低的假阳性(包含可能的候选基因)。