Suppr超能文献

通过整合大规模转录组数据集可改善植物的表型预测。

Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.

作者信息

Wu Zefeng, Sun Yali, Zhao Xiaoqiang, Liu Zigang, Zhou Wenqi, Niu Yining

机构信息

State Key Laboratory of Aridland Crop Science, Gansu Agricultural University, No. 1 Yingmen Village, Anning District, Lanzhou 730070, Gansu Province, China.

Crop Research Institute, Gansu Academy of Agricultural Sciences, No. 1, New Village, Anning District, Lanzhou 730070, Gansu Province, China.

出版信息

NAR Genom Bioinform. 2024 Dec 27;6(4):lqae184. doi: 10.1093/nargab/lqae184. eCollection 2024 Dec.

Abstract

Research on the dynamic expression of genes in plants is important for understanding different biological processes. We used the large amounts of transcriptomic data from various plant sample sources that are publicly available to investigate whether the expression levels of a subset of highly variable genes (HVGs) can be used to accurately identify the phenotypes of plants. Using maize ( L.) as an example, we built machine learning (ML) models to predict phenotypes using a gene expression dataset of 21 612 bulk RNA sequencing samples. We showed that the ML models achieved excellent prediction accuracy using only the HVGs to identify different phenotypes, including tissue types, developmental stages, cultivars and stress conditions. By ML models, several important functional genes were found to be associated with different phenotypes. We performed a similar analysis in rice ( L.) and found that the ML models could be generalized across species. However, the models trained from maize did not perform well in rice, probably because of the expression divergence of the conserved HVGs between the two species. Overall, our results provide an ML framework for phenotype prediction using gene expression profiles, which may contribute to precision management of crops in agricultural practices.

摘要

研究植物基因的动态表达对于理解不同的生物学过程至关重要。我们利用公开可得的来自各种植物样本来源的大量转录组数据,来研究一组高度可变基因(HVGs)的表达水平是否可用于准确识别植物的表型。以玉米(L.)为例,我们构建了机器学习(ML)模型,使用一个包含21612个批量RNA测序样本的基因表达数据集来预测表型。我们表明,ML模型仅使用HVGs就能实现出色的预测准确性,以识别不同的表型,包括组织类型、发育阶段、品种和胁迫条件。通过ML模型,发现了几个重要的功能基因与不同表型相关。我们在水稻(L.)中进行了类似分析,发现ML模型可在不同物种间通用。然而,从玉米训练的模型在水稻中表现不佳,可能是因为这两个物种间保守HVGs的表达存在差异。总体而言,我们的结果提供了一个利用基因表达谱进行表型预测的ML框架,这可能有助于农业实践中作物的精准管理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac89/11672113/02a15a130765/lqae184fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验