Institute of Genomic Diversity, Cornell University, Ithaca, New York.
Plant Breeding and Genetics Section, Cornell University, Ithaca, New York.
PLoS Genet. 2021 Oct 4;17(10):e1009568. doi: 10.1371/journal.pgen.1009568. eCollection 2021 Oct.
Genomic prediction typically relies on associations between single-site polymorphisms and traits of interest. This representation of genomic variability has been successful for predicting many complex traits. However, it usually cannot capture the combination of alleles in haplotypes and it has generated little insight about the biological function of polymorphisms. Here we present a novel and cost-effective method for imputing cis haplotype associated RNA expression (HARE), studied their transferability across tissues, and evaluated genomic prediction models within and across populations. HARE focuses on tightly linked cis acting causal variants in the immediate vicinity of the gene, while excluding trans effects from diffusion and metabolism. Therefore, HARE estimates were more transferrable across different tissues and populations compared to measured transcript expression. We also showed that HARE estimates captured one-third of the variation in gene expression. HARE estimates were used in genomic prediction models evaluated within and across two diverse maize panels-a diverse association panel (Goodman Association panel) and a large half-sib panel (Nested Association Mapping panel)-for predicting 26 complex traits. HARE resulted in up to 15% higher prediction accuracy than control approaches that preserved haplotype structure, suggesting that HARE carried functional information in addition to information about haplotype structure. The largest increase was observed when the model was trained in the Nested Association Mapping panel and tested in the Goodman Association panel. Additionally, HARE yielded higher within-population prediction accuracy as compared to measured expression values. The accuracy achieved by measured expression was variable across tissues, whereas accuracy by HARE was more stable across tissues. Therefore, imputing RNA expression of genes by haplotype is stable, cost-effective, and transferable across populations.
基因组预测通常依赖于单一位点多态性与感兴趣性状之间的关联。这种基因组变异性的表示形式在预测许多复杂性状方面取得了成功。然而,它通常无法捕捉单倍型中等位基因的组合,并且对多态性的生物学功能也没有产生太多的了解。在这里,我们提出了一种新颖且具有成本效益的方法,用于推断顺式单倍型相关的 RNA 表达(HARE),研究了它们在组织间的可转移性,并评估了群体内和群体间的基因组预测模型。HARE 专注于基因附近紧密连锁的顺式作用因果变异,同时排除扩散和代谢的转录效应。因此,与测量的转录表达相比,HARE 估计在不同组织和群体间更具可转移性。我们还表明,HARE 估计捕获了三分之一的基因表达变异。HARE 估计值用于在两个不同的玉米面板内和跨面板评估的基因组预测模型中,用于预测 26 个复杂性状-一个是多样化的关联面板(Goodman Association panel),另一个是大型半同胞面板(Nested Association Mapping panel)。与保留单倍型结构的对照方法相比,HARE 导致高达 15%的预测准确性提高,这表明 HARE 除了单倍型结构信息外,还携带了功能信息。当模型在嵌套关联映射面板中进行训练并在 Goodman Association 面板中进行测试时,观察到的增加最大。此外,与测量的表达值相比,HARE 产生了更高的群体内预测准确性。与 HARE 相比,测量的表达准确性在组织间具有变异性,而 HARE 的准确性在组织间更稳定。因此,通过单倍型推断基因的 RNA 表达是稳定的、具有成本效益的,并且在群体间具有可转移性。