Pham Hieu, Reisner John, Swift Ashley, Olafsson Sigurdur, Vardeman Stephen
Department of Information Systems, Supply Chain, and Analytics, College of Business, The University of Alabama in Huntsville, Huntsville, AL, United States.
Department of Statistics, Iowa State University, Ames, IA, United States.
Front Plant Sci. 2022 Sep 20;13:975976. doi: 10.3389/fpls.2022.975976. eCollection 2022.
Phenotypic variation in plants is attributed to genotype (G), environment (E), and genotype-by-environment interaction (GEI). Although the main effects of G and E are typically larger and easier to model, the GEI interaction effects are important and a critical factor when considering such issues as to why some genotypes perform consistently well across a range of environments. In plant breeding, a major challenge is limited information, including a single genotype is tested in only a small subset of all possible test environments. The two-way table of phenotype responses will therefore commonly contain missing data. In this paper, we propose a new model of GEI effects that only requires an input of a two-way table of phenotype observations, with genotypes as rows and environments as columns that do not assume the completeness of data. Our analysis can deal with this scenario as it utilizes a novel biclustering algorithm that can handle missing values, resulting in an output of homogeneous cells with no interactions between G and E. In other words, we identify subsets of genotypes and environments where phenotype can be modeled simply. Based on this, we fit no-interaction models to predict phenotypes of a given crop and draw insights into how a particular cultivar will perform in the unused test environments. Our new methodology is validated on data from different plant species and phenotypes and shows superior performance compared to well-studied statistical approaches.
植物的表型变异归因于基因型(G)、环境(E)以及基因型与环境的相互作用(GEI)。尽管G和E的主效应通常更大且更易于建模,但在考虑诸如为何某些基因型在一系列环境中表现始终良好等问题时,GEI相互作用效应很重要且是一个关键因素。在植物育种中,一个主要挑战是信息有限,包括单个基因型仅在所有可能测试环境的一小部分中进行测试。因此,表型反应的双向表通常会包含缺失数据。在本文中,我们提出了一种新的GEI效应模型,该模型仅需要输入一个表型观测的双向表,其中基因型作为行,环境作为列,且不假定数据的完整性。我们的分析可以处理这种情况,因为它利用了一种新颖的双聚类算法,该算法可以处理缺失值,从而输出G和E之间无相互作用的同质单元格。换句话说,我们识别出可以简单地对表型进行建模的基因型和环境子集。基于此,我们拟合无相互作用模型来预测给定作物的表型,并深入了解特定品种在未使用的测试环境中的表现。我们的新方法在来自不同植物物种和表型的数据上得到了验证,并且与经过充分研究的统计方法相比表现出卓越的性能。