一种整合多种组学数据的定向学习策略可提高基因组预测能力。

A directed learning strategy integrating multiple omic data improves genomic prediction.

机构信息

College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan, Hubei, China.

Department of Botany and Plant Sciences, University of California, Riverside, CA, USA.

出版信息

Plant Biotechnol J. 2019 Oct;17(10):2011-2020. doi: 10.1111/pbi.13117. Epub 2019 Apr 14.

DOI:10.1111/pbi.13117

PMID:30950198

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6737184/

Abstract

Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome-wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes (cis or trans); and (iii) trait-related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models.

摘要

基因组预测（GP）旨在使用全基因组标记构建用于预测表型的统计模型，是加速分子植物育种的有前途的策略。然而，目前仅使用基因组数据进行表型预测的进展已经达到瓶颈，并且先前关于转录组和代谢组预测的研究忽略了基因组信息。在这里，我们通过将多种组学数据整合到一个模型中，设计了一种称为多层最小绝对值收缩和选择算子（MLLASSO）的新型 GP 策略，该模型迭代地学习由观察到的转录组和代谢组监督的三层遗传特征（GFs）。重要的是，MLLASSO 学习了基因相互作用的高阶信息，这使我们能够显著提高水稻产量的可预测性，从 0.1588（仅 GP）提高到 0.2451（MLLASSO）。在对前两层的预测中，发现一些基因是遗传上可预测的基因（GPGs），因为它们的表达可以通过遗传标记准确预测。有趣的是，我们对 GPGs 做出了三个重大发现：（i）GPGs 是像产量这样的高度复杂性状的良好预测因子；（ii）GPGs 大多是 eQTL 基因（顺式或反式）；（iii）与性状相关的转录因子家族在 GPGs 中富集。这些发现支持这样的观点，即学习到的 GFs 不仅是性状的良好预测因子，而且对基因表达的调控具有特定的生物学意义。为了将新方法与传统的 GP 模型区分开来，我们将 MLLASSO 称为由中间组学数据监督的定向学习策略。与传统的 GP 模型相比，这种新的预测模型似乎更可靠、更稳健。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9100/11386793/a5ac6d88ea64/PBI-17-2011-g003.jpg

相似文献

A directed learning strategy integrating multiple omic data improves genomic prediction.一种整合多种组学数据的定向学习策略可提高基因组预测能力。

Plant Biotechnol J. 2019 Oct;17(10):2011-2020. doi: 10.1111/pbi.13117. Epub 2019 Apr 14.

Incorporation of parental phenotypic data into multi-omic models improves prediction of yield-related traits in hybrid rice.将父母表型数据纳入多组学模型可提高杂交水稻产量相关性状的预测能力。

Plant Biotechnol J. 2021 Feb;19(2):261-272. doi: 10.1111/pbi.13458. Epub 2020 Sep 2.

Identification of optimal prediction models using multi-omic data for selecting hybrid rice.利用多组学数据识别最佳预测模型，以选择杂交水稻。

Heredity (Edinb). 2019 Sep;123(3):395-406. doi: 10.1038/s41437-019-0210-6. Epub 2019 Mar 25.

Improvement of prediction ability by integrating multi-omic datasets in barley.在大麦中整合多组学数据集以提高预测能力。

BMC Genomics. 2022 Mar 12;23(1):200. doi: 10.1186/s12864-022-08337-7.

Prediction and association mapping of agronomic traits in maize using multiple omic data.利用多组学数据对玉米农艺性状进行预测和关联分析

Heredity (Edinb). 2017 Sep;119(3):174-184. doi: 10.1038/hdy.2017.27. Epub 2017 Jun 7.

Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle.利用小参考群体对新性状进行基因组预测的半监督学习：在奶牛剩余采食量中的应用

Genet Sel Evol. 2016 Nov 7;48(1):84. doi: 10.1186/s12711-016-0262-5.

Metabolomic prediction of yield in hybrid rice.杂交水稻产量的代谢组学预测

Plant J. 2016 Oct;88(2):219-227. doi: 10.1111/tpj.13242. Epub 2016 Aug 29.

Toward integration of genomic selection with crop modelling: the development of an integrated approach to predicting rice heading dates.朝着基因组选择与作物模型的整合迈进：开发一种预测水稻抽穗期的综合方法。

Theor Appl Genet. 2016 Apr;129(4):805-817. doi: 10.1007/s00122-016-2667-5. Epub 2016 Jan 20.

Predicting biomass of rice with intermediate traits: Modeling method combining crop growth models and genomic prediction models.利用中间性状预测水稻生物量：结合作物生长模型和基因组预测模型的建模方法。

PLoS One. 2020 Jun 19;15(6):e0233951. doi: 10.1371/journal.pone.0233951. eCollection 2020.

Metabolome-wide association studies for agronomic traits of rice.基于代谢组学的水稻农艺性状全基因组关联分析。

Heredity (Edinb). 2018 Apr;120(4):342-355. doi: 10.1038/s41437-017-0032-3. Epub 2017 Dec 11.

引用本文的文献

Unlocking gene regulatory networks for crop resilience and sustainable agriculture.解锁作物抗逆性和可持续农业的基因调控网络。

Nat Biotechnol. 2025 Jul 2. doi: 10.1038/s41587-025-02727-4.

Integrating multi-omics and machine learning for disease resistance prediction in legumes.整合多组学和机器学习用于豆类抗病性预测

Theor Appl Genet. 2025 Jun 27;138(7):163. doi: 10.1007/s00122-025-04948-2.

Genomic selection: Essence, applications, and prospects.基因组选择：本质、应用与前景。

Plant Genome. 2025 Jun;18(2):e70053. doi: 10.1002/tpg2.70053.

Integrating multi-layered biological priors to improve genomic prediction accuracy in beef cattle.整合多层生物学先验信息以提高肉牛基因组预测准确性。

Biol Direct. 2024 Dec 31;19(1):147. doi: 10.1186/s13062-024-00574-y.

Metabolic marker-assisted genomic prediction improves hybrid breeding.代谢标记辅助基因组预测改善杂交育种。

Plant Commun. 2025 Mar 10;6(3):101199. doi: 10.1016/j.xplc.2024.101199. Epub 2024 Nov 29.

Leveraging transcriptomics-based approaches to enhance genomic prediction: integrating SNPs and gene networks for cotton fibre quality improvement.利用基于转录组学的方法增强基因组预测：整合单核苷酸多态性（SNPs）和基因网络以改善棉花纤维品质

Front Plant Sci. 2024 Sep 20;15:1420837. doi: 10.3389/fpls.2024.1420837. eCollection 2024.

Prediction of plant complex traits via integration of multi-omics data.通过整合多组学数据预测植物复杂性状。

Nat Commun. 2024 Aug 10;15(1):6856. doi: 10.1038/s41467-024-50701-6.

Molecular mechanisms controlling grain size and weight and their biotechnological breeding applications in maize and other cereal crops.控制粒长、粒宽的分子机制及其在玉米和其他谷类作物中的生物技术育种应用。

J Adv Res. 2024 Aug;62:27-46. doi: 10.1016/j.jare.2023.09.016. Epub 2023 Sep 21.

Integration of multi-omics technologies for crop improvement: Status and prospects.用于作物改良的多组学技术整合：现状与展望

Front Bioinform. 2022 Oct 19;2:1027457. doi: 10.3389/fbinf.2022.1027457. eCollection 2022.

Incorporating kernelized multi-omics data improves the accuracy of genomic prediction.整合核多组学数据可提高基因组预测的准确性。

J Anim Sci Biotechnol. 2022 Sep 20;13(1):103. doi: 10.1186/s40104-022-00756-6.

本文引用的文献

agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update.agriGO v2.0：农业社区的 GO 分析工具包，2017 年更新。

Nucleic Acids Res. 2017 Jul 3;45(W1):W122-W129. doi: 10.1093/nar/gkx382.

PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants.植物转录因子数据库4.0：迈向植物转录因子与调控互作的核心枢纽

Nucleic Acids Res. 2017 Jan 4;45(D1):D1040-D1045. doi: 10.1093/nar/gkw982. Epub 2016 Oct 24.

Metabolomic prediction of yield in hybrid rice.杂交水稻产量的代谢组学预测

Plant J. 2016 Oct;88(2):219-227. doi: 10.1111/tpj.13242. Epub 2016 Aug 29.

Prediction of hybrid performance in maize with a ridge regression model employed to DNA markers and mRNA transcription profiles.利用岭回归模型结合DNA标记和mRNA转录谱预测玉米的杂种表现。

BMC Genomics. 2016 Mar 29;17:262. doi: 10.1186/s12864-016-2580-y.

Integrative approaches for large-scale transcriptome-wide association studies.大规模全转录组关联研究的综合方法

Nat Genet. 2016 Mar;48(3):245-52. doi: 10.1038/ng.3506. Epub 2016 Feb 8.

Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics.高维工具变量回归的正则化方法及其在遗传基因组学中的应用

J Am Stat Assoc. 2015;110(509):270-288. doi: 10.1080/01621459.2014.908125.

Predicting hybrid performance in rice using genomic best linear unbiased prediction.利用基因组最佳线性无偏预测法预测水稻杂种表现

Proc Natl Acad Sci U S A. 2014 Aug 26;111(34):12456-61. doi: 10.1073/pnas.1413750111. Epub 2014 Aug 11.

Genomic selection: genome-wide prediction in plant improvement.基因组选择：植物改良中的全基因组预测。

Trends Plant Sci. 2014 Sep;19(9):592-601. doi: 10.1016/j.tplants.2014.05.006. Epub 2014 Jun 23.

The BBX family of plant transcription factors.植物转录因子 BBX 家族。

Trends Plant Sci. 2014 Jul;19(7):460-70. doi: 10.1016/j.tplants.2014.01.010. Epub 2014 Feb 24.

An expression quantitative trait loci-guided co-expression analysis for constructing regulatory network using a rice recombinant inbred line population.利用水稻重组自交系群体构建调控网络的表达数量性状位点引导的共表达分析

J Exp Bot. 2014 Mar;65(4):1069-79. doi: 10.1093/jxb/ert464. Epub 2014 Jan 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种整合多种组学数据的定向学习策略可提高基因组预测能力。

A directed learning strategy integrating multiple omic data improves genomic prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献