Morgante Fabio, Huang Wen, Sørensen Peter, Maltecca Christian, Mackay Trudy F C
Department of Biological Sciences and W. M. Keck Center for Behavioral Biology, North Carolina State University, Raleigh, NC 27695
Program in Genetics, North Carolina State University, Raleigh, NC 27695.
G3 (Bethesda). 2020 Dec 3;10(12):4599-4613. doi: 10.1534/g3.120.401847.
The ability to accurately predict complex trait phenotypes from genetic and genomic data are critical for the implementation of personalized medicine and precision agriculture; however, prediction accuracy for most complex traits is currently low. Here, we used data on whole genome sequences, deep RNA sequencing, and high quality phenotypes for three quantitative traits in the ∼200 inbred lines of the Genetic Reference Panel (DGRP) to compare the prediction accuracies of gene expression and genotypes for three complex traits. We found that expression levels ( = 0.28 and 0.38, for females and males, respectively) provided higher prediction accuracy than genotypes ( = 0.07 and 0.15, for females and males, respectively) for starvation resistance, similar prediction accuracy for chill coma recovery (null for both models and sexes), and lower prediction accuracy for startle response ( = 0.15 and 0.14 for female and male genotypes, respectively; and = 0.12 and 0.11, for females and male transcripts, respectively). Models including both genotype and expression levels did not outperform the best single component model. However, accuracy increased considerably for all the three traits when we included gene ontology (GO) category as an additional layer of information for both genomic variants and transcripts. We found strongly predictive GO terms for each of the three traits, some of which had a clear plausible biological interpretation. For example, for starvation resistance in females, GO:0033500 ( 0.39 for transcripts) and GO:0032870 ( 0.40 for transcripts), have been implicated in carbohydrate homeostasis and cellular response to hormone stimulus (including the insulin receptor signaling pathway), respectively. In summary, this study shows that integrating different sources of information improved prediction accuracy and helped elucidate the genetic architecture of three complex phenotypes.
从遗传和基因组数据准确预测复杂性状表型的能力对于个性化医疗和精准农业的实施至关重要;然而,目前大多数复杂性状的预测准确性较低。在这里,我们使用了遗传参考面板(DGRP)约200个近交系中三个数量性状的全基因组序列数据、深度RNA测序数据和高质量表型数据,来比较基因表达和基因型对三个复杂性状的预测准确性。我们发现,对于抗饥饿能力,表达水平(雌性和雄性分别为 = 0.28和0.38)比基因型(雌性和雄性分别为 = 0.07和0.15)提供了更高的预测准确性;对于冷昏迷恢复,预测准确性相似(两种模型和两性均无显著差异);对于惊吓反应,预测准确性较低(雌性和雄性基因型分别为 = 0.15和0.14;雌性和雄性转录本分别为 = 0.12和0.11)。包含基因型和表达水平的模型并没有优于最佳的单一成分模型。然而,当我们将基因本体(GO)类别作为基因组变异和转录本的额外信息层纳入时,所有三个性状的准确性都有显著提高。我们发现了针对这三个性状中每一个的强预测性GO术语,其中一些具有明确合理的生物学解释。例如,对于雌性的抗饥饿能力,GO:0033500(转录本的 = 0.39)和GO:0032870(转录本的 = 0.40),分别与碳水化合物稳态和细胞对激素刺激的反应(包括胰岛素受体信号通路)有关。总之,这项研究表明,整合不同来源的信息提高了预测准确性,并有助于阐明三种复杂表型的遗传结构。