机器学习通过整合多组学数据来辅助预测负责植物特化代谢物生物合成的基因。

Machine learning assists prediction of genes responsible for plant specialized metabolite biosynthesis by integrating multi-omics data.

机构信息

College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030024, China.

Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, China, 518000, Shenzhen.

出版信息

BMC Genomics. 2024 Apr 29;25(1):418. doi: 10.1186/s12864-024-10258-6.

DOI:10.1186/s12864-024-10258-6

PMID:38679745

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11057162/

Abstract

BACKGROUND

Plant specialized (or secondary) metabolites (PSM), also known as phytochemicals, natural products, or plant constituents, play essential roles in interactions between plants and environment. Although many research efforts have focused on discovering novel metabolites and their biosynthetic genes, the resolution of metabolic pathways and identified biosynthetic genes was limited by rudimentary analysis approaches and enormous number of candidate genes.

RESULTS

Here we integrated state-of-the-art automated machine learning (ML) frame AutoGluon-Tabular and multi-omics data from Arabidopsis to predict genes encoding enzymes involved in biosynthesis of plant specialized metabolite (PSM), focusing on the three main PSM categories: terpenoids, alkaloids, and phenolics. We found that the related features of genomics and proteomics were the top two crucial categories of features contributing to the model performance. Using only these key features, we built a new model in Arabidopsis, which performed better than models built with more features including those related with transcriptomics and epigenomics. Finally, the built models were validated in maize and tomato, and models tested for maize and trained with data from two other species exhibited either equivalent or superior performance to intraspecies predictions.

CONCLUSIONS

Our external validation results in grape and poppy on the one hand implied the applicability of our model to the other species, and on the other hand showed enormous potential to improve the prediction of enzymes synthesizing PSM with the inclusion of valid data from a wider range of species.

摘要

背景

植物特化（或次生）代谢物（PSM），也称为植物化学物质、天然产物或植物成分，在植物与环境的相互作用中起着至关重要的作用。尽管许多研究都集中在发现新的代谢物及其生物合成基因上，但代谢途径的解析和鉴定的生物合成基因受到基本分析方法和大量候选基因的限制。

结果

在这里，我们整合了最先进的自动化机器学习（ML）框架 AutoGluon-Tabular 和来自拟南芥的多组学数据，以预测编码参与植物特化代谢物（PSM）生物合成的酶的基因，重点关注三个主要的 PSM 类别：萜类、生物碱和酚类。我们发现基因组学和蛋白质组学的相关特征是对模型性能贡献最大的两个关键特征类别。仅使用这些关键特征，我们在拟南芥中构建了一个新模型，该模型的性能优于使用包括转录组学和表观基因组学相关特征在内的更多特征构建的模型。最后，我们在玉米和番茄中进行了模型验证，在其他两个物种中进行测试并在另两个物种的数据中进行训练的模型在玉米中的表现与种内预测相当或更优。

结论

我们在葡萄和罂粟上的外部验证结果一方面暗示了我们的模型在其他物种中的适用性，另一方面表明通过纳入来自更广泛物种的有效数据，极大地提高了预测合成 PSM 的酶的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3965/11057162/b7e3dcadb53e/12864_2024_10258_Fig1_HTML.jpg

相似文献

Machine learning assists prediction of genes responsible for plant specialized metabolite biosynthesis by integrating multi-omics data.机器学习通过整合多组学数据来辅助预测负责植物特化代谢物生物合成的基因。

BMC Genomics. 2024 Apr 29;25(1):418. doi: 10.1186/s12864-024-10258-6.

[Development of Plant Metabolomics and Medicinal Plant Genomics].[植物代谢组学与药用植物基因组学的发展]

Yakugaku Zasshi. 2018;138(1):1-18. doi: 10.1248/yakushi.17-00193.

Unlocking plant bioactive pathways: omics data harnessing and machine learning assisting.解锁植物生物活性途径：组学数据利用和机器学习辅助。

Curr Opin Biotechnol. 2024 Jun;87:103135. doi: 10.1016/j.copbio.2024.103135. Epub 2024 May 9.

A critical review of machine-learning for "multi-omics" marine metabolite datasets.机器学习在“多组学”海洋代谢物数据集上的应用综述

Comput Biol Med. 2023 Oct;165:107425. doi: 10.1016/j.compbiomed.2023.107425. Epub 2023 Aug 29.

[Metabolic engineering of terpenoids in plants].[植物中萜类化合物的代谢工程]

Sheng Wu Gong Cheng Xue Bao. 2007 Jul;23(4):561-9.

Phytochemical genomics--a new trend.植物化学基因组学——一个新趋势。

Curr Opin Plant Biol. 2013 Jun;16(3):373-80. doi: 10.1016/j.pbi.2013.04.001. Epub 2013 Apr 27.

Machine learning: its challenges and opportunities in plant system biology.机器学习：在植物系统生物学中的挑战与机遇。

Appl Microbiol Biotechnol. 2022 May;106(9-10):3507-3530. doi: 10.1007/s00253-022-11963-6. Epub 2022 May 16.

Transcriptional regulation of secondary metabolite biosynthesis in plants.植物中次生代谢物生物合成的转录调控

Biochim Biophys Acta. 2013 Nov;1829(11):1236-47. doi: 10.1016/j.bbagrm.2013.09.006. Epub 2013 Oct 7.

Metabolomics-centered mining of plant metabolic diversity and function: Past decade and future perspectives.基于代谢组学的植物代谢多样性与功能挖掘：过去十年及未来展望

Mol Plant. 2023 Jan 2;16(1):43-63. doi: 10.1016/j.molp.2022.09.007. Epub 2022 Sep 16.

From single- to multi-omics: future research trends in medicinal plants.从单组学到多组学：药用植物的未来研究趋势。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac485.

引用本文的文献

Antioxidant Defense Systems in Plants: Mechanisms, Regulation, and Biotechnological Strategies for Enhanced Oxidative Stress Tolerance.植物中的抗氧化防御系统：增强氧化应激耐受性的机制、调控及生物技术策略

Life (Basel). 2025 Aug 14;15(8):1293. doi: 10.3390/life15081293.

Integrating multi-omics and machine learning for disease resistance prediction in legumes.整合多组学和机器学习用于豆类抗病性预测

Theor Appl Genet. 2025 Jun 27;138(7):163. doi: 10.1007/s00122-025-04948-2.

Unlocking the potential of flavonoid biosynthesis through integrated metabolic engineering.通过综合代谢工程释放类黄酮生物合成的潜力。

Front Plant Sci. 2025 May 29;16:1597007. doi: 10.3389/fpls.2025.1597007. eCollection 2025.

Improving plant breeding through AI-supported data integration.通过人工智能支持的数据整合改进植物育种。

Theor Appl Genet. 2025 Jun 2;138(6):132. doi: 10.1007/s00122-025-04910-2.

Using supervised machine-learning approaches to understand abiotic stress tolerance and design resilient crops.利用监督式机器学习方法来理解非生物胁迫耐受性并设计抗逆作物。

Philos Trans R Soc Lond B Biol Sci. 2025 May 29;380(1927):20240252. doi: 10.1098/rstb.2024.0252.

Algorithms and tools for data-driven omics integration to achieve multilayer biological insights: a narrative review.用于数据驱动的组学整合以实现多层生物学见解的算法和工具：一篇综述

J Transl Med. 2025 Apr 10;23(1):425. doi: 10.1186/s12967-025-06446-x.

New insights in metabolism modelling to decipher plant-microbe interactions.代谢建模的新见解以解析植物-微生物相互作用

New Phytol. 2025 May;246(4):1485-1493. doi: 10.1111/nph.70063. Epub 2025 Mar 21.

Navigating the challenges of engineering composite specialized metabolite pathways in plants.应对植物中工程化复合特殊代谢途径的挑战。

Plant J. 2025 Mar;121(6):e70100. doi: 10.1111/tpj.70100.

Proteomics: An Essential Tool to Study Plant-Specialized Metabolism.蛋白质组学：研究植物次生代谢的重要工具。

Biomolecules. 2024 Nov 30;14(12):1539. doi: 10.3390/biom14121539.

本文引用的文献

A microbial supply chain for production of the anti-cancer drug vinblastine.生产抗癌药物长春碱的微生物供应链。

Nature. 2022 Sep;609(7926):341-347. doi: 10.1038/s41586-022-05157-3. Epub 2022 Aug 31.

A sequence-based global map of regulatory activity for deciphering human genetics.基于序列的人类遗传学解码调控活性的全局图谱。

Nat Genet. 2022 Jul;54(7):940-949. doi: 10.1038/s41588-022-01102-2. Epub 2022 Jul 11.

Biosynthesis of strychnine.士的宁的生物合成。

Nature. 2022 Jul;607(7919):617-622. doi: 10.1038/s41586-022-04950-4. Epub 2022 Jul 6.

Allele-aware chromosome-level genome assembly of Artemisia annua reveals the correlation between ADS expansion and artemisinin yield.基于等位基因感知的黄花蒿染色体水平基因组组装揭示 ADS 扩增与青蒿素产量的相关性。

Mol Plant. 2022 Aug 1;15(8):1310-1328. doi: 10.1016/j.molp.2022.05.013. Epub 2022 Jun 1.

Synthesis and target annotation of the alkaloid GB18.生物碱 GB18 的合成及靶标注释。

Nature. 2022 Jun;606(7916):917-921. doi: 10.1038/s41586-022-04840-9. Epub 2022 May 12.

Genome-wide cis-decoding for expression design in tomato using cistrome data and explainable deep learning.利用顺式作用元件组数据和可解释深度学习进行番茄基因表达设计的全基因组顺式解码。

Plant Cell. 2022 May 24;34(6):2174-2187. doi: 10.1093/plcell/koac079.

Identification of antimicrobial peptides from the human gut microbiome using deep learning.利用深度学习从人类肠道微生物组中识别抗菌肽。

Nat Biotechnol. 2022 Jun;40(6):921-931. doi: 10.1038/s41587-022-01226-0. Epub 2022 Mar 3.

Antioxidant and Antimicrobial Activities of Chemically-Characterized Essential Oil from Lam. against Drug-Resistant Microbes.化学特征鉴定的香茅精油的抗氧化和抗菌活性及其对耐药微生物的抑制作用

Molecules. 2022 Feb 8;27(3):1136. doi: 10.3390/molecules27031136.

Machine learning prediction and tau-based screening identifies potential Alzheimer's disease genes relevant to immunity.机器学习预测和基于 Tau 的筛选鉴定出与免疫相关的潜在阿尔茨海默病基因。

Commun Biol. 2022 Feb 11;5(1):125. doi: 10.1038/s42003-022-03068-7.

Computational prediction of plant metabolic pathways.植物代谢途径的计算预测

Curr Opin Plant Biol. 2022 Apr;66:102171. doi: 10.1016/j.pbi.2021.102171. Epub 2022 Jan 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

机器学习通过整合多组学数据来辅助预测负责植物特化代谢物生物合成的基因。

Machine learning assists prediction of genes responsible for plant specialized metabolite biosynthesis by integrating multi-omics data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献