INRAE, AgroParisTech, GABI, Université Paris-Saclay, Allée de Vilvert, 78350, Jouy-en-Josas, France.
GenPhySE, INRAE, ENVT, Université de Toulouse, 31320, Castanet Tolosan, France.
BMC Bioinformatics. 2022 Sep 6;23(1):365. doi: 10.1186/s12859-022-04914-5.
It is now widespread in livestock and plant breeding to use genotyping data to predict phenotypes with genomic prediction models. In parallel, genomic annotations related to a variety of traits are increasing in number and granularity, providing valuable insight into potentially important positions in the genome. The BayesRC model integrates this prior biological information by factorizing the genome according to disjoint annotation categories, in some cases enabling improved prediction of heritable traits. However, BayesRC is not adapted to cases where markers may have multiple annotations.
We propose two novel Bayesian approaches to account for multi-annotated markers through a cumulative (BayesRC+) or preferential (BayesRC[Formula: see text]) model of the contribution of multiple annotation categories. We illustrate their performance on simulated data with various genetic architectures and types of annotations. We also explore their use on data from a backcross population of growing pigs in conjunction with annotations constructed using the PigQTLdb. In both simulated and real data, we observed a modest improvement in prediction quality with our models when used with informative annotations. In addition, our results show that BayesRC+ successfully prioritizes multi-annotated markers according to their posterior variance, while BayesRC[Formula: see text] provides a useful interpretation of informative annotations for multi-annotated markers. Finally, we explore several strategies for constructing annotations from a public database, highlighting the importance of careful consideration of this step.
When used with annotations that are relevant to the trait under study, BayesRC[Formula: see text] and BayesRC+ allow for improved prediction and prioritization of multi-annotated markers, and can provide useful biological insight into the genetic architecture of traits.
在畜牧业和植物育种中,利用基因分型数据通过基因组预测模型来预测表型已经非常普遍。与此同时,与各种性状相关的基因组注释数量和粒度都在增加,这为理解基因组中潜在的重要位置提供了有价值的信息。BayesRC 模型通过根据不相交的注释类别对基因组进行因子分解来整合这种先验的生物学信息,在某些情况下可以提高对可遗传性状的预测能力。然而,BayesRC 并不适应于标记可能具有多个注释的情况。
我们提出了两种新的贝叶斯方法,通过对多个注释类别的累积(BayesRC+)或优先(BayesRC[Formula: see text])模型来处理多注释标记。我们通过模拟具有不同遗传结构和注释类型的数据来展示它们的性能。我们还探索了它们在结合使用 PigQTLdb 构建注释的生长猪回交群体数据上的应用。在模拟和真实数据中,当我们使用信息丰富的注释时,我们的模型在预测质量上都观察到了适度的提高。此外,我们的结果表明,BayesRC+ 可以根据其后验方差成功地对多注释标记进行优先级排序,而 BayesRC[Formula: see text]则为多注释标记的信息丰富的注释提供了有用的解释。最后,我们探索了从公共数据库构建注释的几种策略,强调了仔细考虑这一步骤的重要性。
当与与研究性状相关的注释一起使用时,BayesRC[Formula: see text]和 BayesRC+可以提高多注释标记的预测和优先级排序,并为性状的遗传结构提供有用的生物学见解。