Montesinos-López Osval A, Montesinos-López Abelardo, Crossa José, Montesinos-López José C, Mota-Sanchez David, Estrada-González Fermín, Gillberg Jussi, Singh Ravi, Mondal Suchismita, Juliana Philomin
Facultad de Telemática, Universidad de Colima, 28040 Colima, México
Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430 Jalisco, México.
G3 (Bethesda). 2018 Jan 4;8(1):131-147. doi: 10.1534/g3.117.300309.
In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.
在基于基因组的预测中,提高不同环境下品系预测准确性的任务颇具难度,因为可用信息通常较为稀疏,而且性状之间的相关性通常较低。在当前的基因组选择中,尽管研究人员拥有大量信息以及合适的统计模型来处理这些信息,但进行计算的效率仍然有限。虽然一些统计模型通常在数学上很精妙,但其中许多在计算上效率低下,并且对于许多性状、品系、环境和年份而言并不实用,因为它们需要从巨大的正态多元分布中进行抽样。基于这些原因,本研究在多性状和多环境的背景下探索了两种推荐系统:基于项目的协同过滤(IBCF)和矩阵分解算法(MF)。将IBCF和MF方法与两种传统方法在模拟数据和真实数据上进行了比较。模拟数据集和真实数据集的结果表明,当相关性处于中等高度时,IBCF技术在预测准确性方面比两种传统方法和MF方法略胜一筹。IBCF技术非常具有吸引力,因为当项目(环境 - 性状组合)之间存在高度相关性时,它能产生良好的预测结果,并且其实现方式在计算上是可行的,这对于处理非常大的数据集的植物育种者可能会很有用。