Suppr超能文献

多组学借助机器学习方法助力玉米产量的基因组预测。

Multi-omics assists genomic prediction of maize yield with machine learning approaches.

作者信息

Wu Chengxiu, Luo Jingyun, Xiao Yingjie

机构信息

National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China.

Hubei Hongshan Laboratory, Wuhan, 430070 China.

出版信息

Mol Breed. 2024 Feb 8;44(2):14. doi: 10.1007/s11032-024-01454-z. eCollection 2024 Feb.

Abstract

UNLABELLED

With the improvement of high-throughput technologies in recent years, large multi-dimensional plant omics data have been produced, and big-data-driven yield prediction research has received increasing attention. Machine learning offers promising computational and analytical solutions to interpret the biological meaning of large amounts of data in crops. In this study, we utilized multi-omics datasets from 156 maize recombinant inbred lines, containing 2496 single nucleotide polymorphisms (SNPs), 46 image traits (i-traits) from 16 developmental stages obtained through an automatic phenotyping platform, and 133 primary metabolites. Based on benchmark tests with different types of prediction models, some machine learning methods, such as Partial Least Squares (PLS), Random Forest (RF), and Gaussian process with Radial basis function kernel (GaussprRadial), achieved better prediction for maize yield, albeit slight difference for method preferences among i-traits, genomic, and metabolic data. We found that better yield prediction may be caused by various capabilities in ranking and filtering data features, which is found to be linked with biological meaning such as photosynthesis-related or kernel development-related regulations. Finally, by integrating multiple omics data with the RF machine learning approach, we can further improve the prediction accuracy of grain yield from 0.32 to 0.43. Our research provides new ideas for the application of plant omics data and artificial intelligence approaches to facilitate crop genetic improvements.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s11032-024-01454-z.

摘要

未标注

近年来,随着高通量技术的改进,已产生了大量多维度的植物组学数据,大数据驱动的产量预测研究受到越来越多的关注。机器学习为解释作物中大量数据的生物学意义提供了有前景的计算和分析解决方案。在本研究中,我们利用了来自156个玉米重组自交系的多组学数据集,其中包括2496个单核苷酸多态性(SNP)、通过自动表型平台在16个发育阶段获得的46个图像性状(i-性状)以及133种初级代谢产物。基于对不同类型预测模型的基准测试,一些机器学习方法,如偏最小二乘法(PLS)、随机森林(RF)和具有径向基函数核的高斯过程(GaussprRadial),对玉米产量实现了更好的预测,尽管在i-性状、基因组和代谢数据之间对方法的偏好略有差异。我们发现,更好的产量预测可能是由在排序和筛选数据特征方面的各种能力导致的,这些能力被发现与光合作用相关或籽粒发育相关调控等生物学意义有关。最后,通过将多个组学数据与RF机器学习方法相结合,我们可以将籽粒产量的预测准确率从0.32进一步提高到0.43。我们的研究为植物组学数据和人工智能方法在促进作物遗传改良方面的应用提供了新思路。

补充信息

在线版本包含可在10.1007/s11032-024-01454-z获取的补充材料。

相似文献

7
A high-throughput and low-cost maize ear traits scorer.一种高通量低成本的玉米穗性状评分器。
Mol Breed. 2021 Feb 13;41(2):17. doi: 10.1007/s11032-021-01205-4. eCollection 2021 Feb.

引用本文的文献

本文引用的文献

1
Machine Learning Methods for Small Data Challenges in Molecular Science.机器学习方法在分子科学中小数据挑战中的应用。
Chem Rev. 2023 Jul 12;123(13):8736-8780. doi: 10.1021/acs.chemrev.3c00189. Epub 2023 Jun 29.
6
Calvin-Benson cycle regulation is getting complex.卡尔文-本森循环调节变得越来越复杂。
Trends Plant Sci. 2021 Sep;26(9):898-912. doi: 10.1016/j.tplants.2021.03.008. Epub 2021 Apr 20.
7
Cover crops and drought: Maize ecophysiology and yield dataset.覆盖作物与干旱:玉米生态生理学和产量数据集。
Data Brief. 2021 Feb 9;35:106856. doi: 10.1016/j.dib.2021.106856. eCollection 2021 Apr.
9
Multi-omics-based prediction of hybrid performance in canola.基于多组学的油菜杂交性能预测
Theor Appl Genet. 2021 Apr;134(4):1147-1165. doi: 10.1007/s00122-020-03759-x. Epub 2021 Feb 1.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验