用于整合部分重叠遗传数据集并模拟黑麦草宿主-内生菌效应以估计干物质产量的机器学习解决方案。

Machine learning solutions for integrating partially overlapping genetic datasets and modelling host-endophyte effects in ryegrass () dry matter yield estimation.

作者信息

Zhu Jiashuai, Malmberg M Michelle, Shinozuka Maiko, Retegan Renata M, Cogan Noel O, Jacobs Joe L, Giri Khageswor, Smith Kevin F

机构信息

Faculty of Science, The University of Melbourne, Parkville, VIC, Australia.

Agriculture Victoria, AgriBio Centre, Bundoora, VIC, Australia.

出版信息

Front Plant Sci. 2025 May 6;16:1543956. doi: 10.3389/fpls.2025.1543956. eCollection 2025.

Abstract

Plant genetic evaluation often faces challenges due to complex genetic structures. Ryegrass (), a valuable species for pasture-based agriculture, exhibits heterogeneous genetic diversities among base breeding populations. Partially overlapping datasets from incompatible studies and commercial restrictions further impede outcome integration across studies, complicating the evaluation of key agricultural traits such as dry matter yield (DMY). To address these challenges: (1) we implemented a population genotyping approach to capture the genetic diversity in ryegrass base cultivars; (2) we introduced a machine learning-based strategy to integrate genetic distance matrices (GDMs) from incompatible genotyping approaches, including alignments using multidimensional scaling (MDS) and Procrustes transformation, as well as a novel evaluation strategy (BESMI) for the imputation of structural missing data. Endophytes complicate genetic evaluation by introducing additional variation in phenotypic expression. (3) We modelled the impacts of nine commercial endophytes on ryegrass DMY, enabling a more balanced estimation of untested cultivar-endophyte combinations. (4) Phylogenetic analysis provided a pseudo-pedigree relationship of the 113 ryegrass populations and revealed its associations with DMY variations. Overall, this research offers practical insights for integrating partially overlapping GDMs with structural missing data patterns and facilitates the identification of high-performing ryegrass clades. The methodological advancements-including population sequencing, MDS alignment via Procrustes transformation, and BESMI-extend beyond ryegrass applications.

摘要

由于复杂的遗传结构,植物遗传评估常常面临挑战。黑麦草是基于牧场的农业中的一种重要物种,在基础育种群体中表现出异质的遗传多样性。来自不兼容研究的部分重叠数据集以及商业限制进一步阻碍了跨研究的结果整合,使干物质产量(DMY)等关键农业性状的评估变得复杂。为应对这些挑战:(1)我们实施了一种群体基因分型方法来捕捉黑麦草基础品种的遗传多样性;(2)我们引入了一种基于机器学习的策略,以整合来自不兼容基因分型方法的遗传距离矩阵(GDMs),包括使用多维缩放(MDS)和普罗克拉斯提斯变换的比对,以及一种用于估算结构缺失数据的新型评估策略(BESMI)。内生菌通过在表型表达中引入额外变异使遗传评估变得复杂。(3)我们模拟了九种商业内生菌对黑麦草DMY的影响,从而能够更平衡地估计未测试的品种 - 内生菌组合。(4)系统发育分析提供了113个黑麦草种群的假系谱关系,并揭示了其与DMY变异的关联。总体而言,本研究为将部分重叠的GDMs与结构缺失数据模式进行整合提供了实用见解,并有助于识别高性能的黑麦草进化枝。包括群体测序、通过普罗克拉斯提斯变换进行MDS比对以及BESMI在内的方法学进展不仅适用于黑麦草。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f20b/12100933/c36d387a2fba/fpls-16-1543956-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索