College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan, Hubei, China.
Department of Botany and Plant Sciences, University of California, Riverside, CA, USA.
Plant Biotechnol J. 2019 Oct;17(10):2011-2020. doi: 10.1111/pbi.13117. Epub 2019 Apr 14.
Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome-wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes (cis or trans); and (iii) trait-related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models.
基因组预测(GP)旨在使用全基因组标记构建用于预测表型的统计模型,是加速分子植物育种的有前途的策略。然而,目前仅使用基因组数据进行表型预测的进展已经达到瓶颈,并且先前关于转录组和代谢组预测的研究忽略了基因组信息。在这里,我们通过将多种组学数据整合到一个模型中,设计了一种称为多层最小绝对值收缩和选择算子(MLLASSO)的新型 GP 策略,该模型迭代地学习由观察到的转录组和代谢组监督的三层遗传特征(GFs)。重要的是,MLLASSO 学习了基因相互作用的高阶信息,这使我们能够显著提高水稻产量的可预测性,从 0.1588(仅 GP)提高到 0.2451(MLLASSO)。在对前两层的预测中,发现一些基因是遗传上可预测的基因(GPGs),因为它们的表达可以通过遗传标记准确预测。有趣的是,我们对 GPGs 做出了三个重大发现:(i)GPGs 是像产量这样的高度复杂性状的良好预测因子;(ii)GPGs 大多是 eQTL 基因(顺式或反式);(iii)与性状相关的转录因子家族在 GPGs 中富集。这些发现支持这样的观点,即学习到的 GFs 不仅是性状的良好预测因子,而且对基因表达的调控具有特定的生物学意义。为了将新方法与传统的 GP 模型区分开来,我们将 MLLASSO 称为由中间组学数据监督的定向学习策略。与传统的 GP 模型相比,这种新的预测模型似乎更可靠、更稳健。