通过整合大规模转录组数据集可改善植物的表型预测。

Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.

作者信息

Wu Zefeng, Sun Yali, Zhao Xiaoqiang, Liu Zigang, Zhou Wenqi, Niu Yining

机构信息

State Key Laboratory of Aridland Crop Science, Gansu Agricultural University, No. 1 Yingmen Village, Anning District, Lanzhou 730070, Gansu Province, China.

Crop Research Institute, Gansu Academy of Agricultural Sciences, No. 1, New Village, Anning District, Lanzhou 730070, Gansu Province, China.

出版信息

NAR Genom Bioinform. 2024 Dec 27;6(4):lqae184. doi: 10.1093/nargab/lqae184. eCollection 2024 Dec.

DOI:10.1093/nargab/lqae184

PMID:39735343

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11672113/

Abstract

Research on the dynamic expression of genes in plants is important for understanding different biological processes. We used the large amounts of transcriptomic data from various plant sample sources that are publicly available to investigate whether the expression levels of a subset of highly variable genes (HVGs) can be used to accurately identify the phenotypes of plants. Using maize ( L.) as an example, we built machine learning (ML) models to predict phenotypes using a gene expression dataset of 21 612 bulk RNA sequencing samples. We showed that the ML models achieved excellent prediction accuracy using only the HVGs to identify different phenotypes, including tissue types, developmental stages, cultivars and stress conditions. By ML models, several important functional genes were found to be associated with different phenotypes. We performed a similar analysis in rice ( L.) and found that the ML models could be generalized across species. However, the models trained from maize did not perform well in rice, probably because of the expression divergence of the conserved HVGs between the two species. Overall, our results provide an ML framework for phenotype prediction using gene expression profiles, which may contribute to precision management of crops in agricultural practices.

摘要

研究植物基因的动态表达对于理解不同的生物学过程至关重要。我们利用公开可得的来自各种植物样本来源的大量转录组数据，来研究一组高度可变基因（HVGs）的表达水平是否可用于准确识别植物的表型。以玉米（L.）为例，我们构建了机器学习（ML）模型，使用一个包含21612个批量RNA测序样本的基因表达数据集来预测表型。我们表明，ML模型仅使用HVGs就能实现出色的预测准确性，以识别不同的表型，包括组织类型、发育阶段、品种和胁迫条件。通过ML模型，发现了几个重要的功能基因与不同表型相关。我们在水稻（L.）中进行了类似分析，发现ML模型可在不同物种间通用。然而，从玉米训练的模型在水稻中表现不佳，可能是因为这两个物种间保守HVGs的表达存在差异。总体而言，我们的结果提供了一个利用基因表达谱进行表型预测的ML框架，这可能有助于农业实践中作物的精准管理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac89/11672113/02a15a130765/lqae184fig1.jpg

相似文献

Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.通过整合大规模转录组数据集可改善植物的表型预测。

NAR Genom Bioinform. 2024 Dec 27;6(4):lqae184. doi: 10.1093/nargab/lqae184. eCollection 2024 Dec.

TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield.TrG2P：一种基于迁移学习的工具，集成多性状数据，用于准确预测作物产量。

Plant Commun. 2024 Jul 8;5(7):100975. doi: 10.1016/j.xplc.2024.100975. Epub 2024 May 15.

Genomic prediction models for traits differing in heritability for soybean, rice, and maize.大豆、水稻和玉米不同遗传力性状的基因组预测模型。

BMC Plant Biol. 2022 Feb 26;22(1):87. doi: 10.1186/s12870-022-03479-y.

Comparative transcriptomic and physiological analyses of contrasting hybrid cultivars ND476 and ZX978 identify important differentially expressed genes and pathways regulating drought stress tolerance in maize.对对比杂交品种ND476和ZX978的转录组和生理分析确定了调控玉米耐旱性的重要差异表达基因和途径。

Genes Genomics. 2020 Aug;42(8):937-955. doi: 10.1007/s13258-020-00962-4. Epub 2020 Jul 4.

Expression of OsMYB55 in maize activates stress-responsive genes and enhances heat and drought tolerance.水稻MYB55（OsMYB55）在玉米中的表达激活胁迫响应基因并增强耐热性和耐旱性。

BMC Genomics. 2016 Apr 29;17:312. doi: 10.1186/s12864-016-2659-5.

Genome-wide identification and analysis of WRKY gene family in maize provide insights into regulatory network in response to abiotic stresses.全基因组鉴定和分析玉米中的 WRKY 基因家族，为研究非生物胁迫响应的调控网络提供了线索。

BMC Plant Biol. 2021 Sep 20;21(1):427. doi: 10.1186/s12870-021-03206-z.

Gene expression biomarkers provide sensitive indicators of in planta nitrogen status in maize.基因表达生物标志物为玉米体内氮素状况提供了敏感的指示。

Plant Physiol. 2011 Dec;157(4):1841-52. doi: 10.1104/pp.111.187898. Epub 2011 Oct 6.

Gene coexpression network alignment and conservation of gene modules between two grass species: maize and rice.在两个禾本科物种：玉米和水稻之间进行基因共表达网络比对和基因模块的保守性分析。

Plant Physiol. 2011 Jul;156(3):1244-56. doi: 10.1104/pp.111.173047. Epub 2011 May 23.

NetREx: Network-based Rice Expression Analysis Server for abiotic stress conditions.NetREx：基于网络的水稻表达分析服务器，用于非生物胁迫条件。

Database (Oxford). 2022 Aug 6;2022. doi: 10.1093/database/baac060.

Identification of genes specifically or preferentially expressed in maize silk reveals similarity and diversity in transcript abundance of different dry stigmas.鉴定玉米花丝中特异或优先表达的基因揭示了不同干燥柱头中转录丰度的相似性和多样性。

BMC Genomics. 2012 Jul 2;13:294. doi: 10.1186/1471-2164-13-294.

本文引用的文献

Species-wide quantitative transcriptomes and proteomes reveal distinct genetic control of gene expression variation in yeast.全物种定量转录组和蛋白质组揭示了酵母中基因表达变异的独特遗传控制。

Proc Natl Acad Sci U S A. 2024 May 7;121(19):e2319211121. doi: 10.1073/pnas.2319211121. Epub 2024 May 2.

Plant biomarkers as early detection tools in stress management in food crops: a review.植物生物标志物作为粮食作物胁迫管理中的早期检测工具：综述

Planta. 2024 Feb 5;259(3):60. doi: 10.1007/s00425-024-04333-1.

Tackling redundancy: genetic mechanisms underlying paralog compensation in plants.解决冗余问题：植物中基因家族成员补偿的遗传机制。

New Phytol. 2023 Nov;240(4):1381-1389. doi: 10.1111/nph.19267. Epub 2023 Sep 19.

Computational workflow for investigating highly variable genes in single-cell RNA-seq across multiple time points and cell types.用于在单细胞 RNA-seq 中跨多个时间点和细胞类型研究高度可变基因的计算工作流程。

STAR Protoc. 2023 Sep 15;4(3):102387. doi: 10.1016/j.xpro.2023.102387. Epub 2023 Jun 27.

Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review.使用基因表达数据进行癌症分类的机器学习方法：综述

Bioengineering (Basel). 2023 Jan 28;10(2):173. doi: 10.3390/bioengineering10020173.

PlantExp: a platform for exploration of gene expression and alternative splicing based on public plant RNA-seq samples.PlantExp：一个基于公共植物 RNA-seq 样本的基因表达和可变剪接探索平台。

Nucleic Acids Res. 2023 Jan 6;51(D1):D1483-D1491. doi: 10.1093/nar/gkac917.

eQTLs play critical roles in regulating gene expression and identifying key regulators in rice.eQTLs 在调控基因表达和鉴定水稻中的关键调控因子方面发挥着关键作用。

Plant Biotechnol J. 2022 Dec;20(12):2357-2371. doi: 10.1111/pbi.13912. Epub 2022 Sep 10.

Heat Stress-Mediated Constraints in Maize () Production: Challenges and Solutions.热应激对玉米生产的制约：挑战与解决方案

Front Plant Sci. 2022 Apr 29;13:879366. doi: 10.3389/fpls.2022.879366. eCollection 2022.

Detecting signatures of selection on gene expression.检测基因表达选择的特征。

Nat Ecol Evol. 2022 Jul;6(7):1035-1045. doi: 10.1038/s41559-022-01761-8. Epub 2022 May 12.

Convergent selection of a WD40 protein that enhances grain yield in maize and rice.WD40 蛋白的趋同选择增强了玉米和水稻的籽粒产量。

Science. 2022 Mar 25;375(6587):eabg7985. doi: 10.1126/science.abg7985.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过整合大规模转录组数据集可改善植物的表型预测。

Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献