Suppr超能文献

机器学习通过整合多组学数据来辅助预测负责植物特化代谢物生物合成的基因。

Machine learning assists prediction of genes responsible for plant specialized metabolite biosynthesis by integrating multi-omics data.

机构信息

College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030024, China.

Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, China, 518000, Shenzhen.

出版信息

BMC Genomics. 2024 Apr 29;25(1):418. doi: 10.1186/s12864-024-10258-6.

Abstract

BACKGROUND

Plant specialized (or secondary) metabolites (PSM), also known as phytochemicals, natural products, or plant constituents, play essential roles in interactions between plants and environment. Although many research efforts have focused on discovering novel metabolites and their biosynthetic genes, the resolution of metabolic pathways and identified biosynthetic genes was limited by rudimentary analysis approaches and enormous number of candidate genes.

RESULTS

Here we integrated state-of-the-art automated machine learning (ML) frame AutoGluon-Tabular and multi-omics data from Arabidopsis to predict genes encoding enzymes involved in biosynthesis of plant specialized metabolite (PSM), focusing on the three main PSM categories: terpenoids, alkaloids, and phenolics. We found that the related features of genomics and proteomics were the top two crucial categories of features contributing to the model performance. Using only these key features, we built a new model in Arabidopsis, which performed better than models built with more features including those related with transcriptomics and epigenomics. Finally, the built models were validated in maize and tomato, and models tested for maize and trained with data from two other species exhibited either equivalent or superior performance to intraspecies predictions.

CONCLUSIONS

Our external validation results in grape and poppy on the one hand implied the applicability of our model to the other species, and on the other hand showed enormous potential to improve the prediction of enzymes synthesizing PSM with the inclusion of valid data from a wider range of species.

摘要

背景

植物特化(或次生)代谢物(PSM),也称为植物化学物质、天然产物或植物成分,在植物与环境的相互作用中起着至关重要的作用。尽管许多研究都集中在发现新的代谢物及其生物合成基因上,但代谢途径的解析和鉴定的生物合成基因受到基本分析方法和大量候选基因的限制。

结果

在这里,我们整合了最先进的自动化机器学习(ML)框架 AutoGluon-Tabular 和来自拟南芥的多组学数据,以预测编码参与植物特化代谢物(PSM)生物合成的酶的基因,重点关注三个主要的 PSM 类别:萜类、生物碱和酚类。我们发现基因组学和蛋白质组学的相关特征是对模型性能贡献最大的两个关键特征类别。仅使用这些关键特征,我们在拟南芥中构建了一个新模型,该模型的性能优于使用包括转录组学和表观基因组学相关特征在内的更多特征构建的模型。最后,我们在玉米和番茄中进行了模型验证,在其他两个物种中进行测试并在另两个物种的数据中进行训练的模型在玉米中的表现与种内预测相当或更优。

结论

我们在葡萄和罂粟上的外部验证结果一方面暗示了我们的模型在其他物种中的适用性,另一方面表明通过纳入来自更广泛物种的有效数据,极大地提高了预测合成 PSM 的酶的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3965/11057162/b7e3dcadb53e/12864_2024_10258_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验