INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas 78350, France.
BioEcoAgro Joint Research Unit, INRAE, Université de Liège, Université de Lille, Université de Picardie Jules Verne, Peronne 80203, France.
G3 (Bethesda). 2021 Oct 19;11(11). doi: 10.1093/g3journal/jkab225.
Technological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures and phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium (LD). We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak LD with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.
技术进步和成本降低导致越来越密集的基因分型数据的出现,使得确定潜在的因果标记成为可能。定制基因分型芯片结合了中密度基因型和定制基因型面板,可以利用这些候选基因,潜在地提高基因组预测的准确性和可解释性。为此,一个特别有前途的模型是 BayesR,它将标记分为四个效应大小类。BayesR 已经在真实数据应用中显示出了准确的预测能力和对数量性状位点 (QTL) 映射的承诺,但目前在模拟数据中缺乏广泛的基准测试。基于一组真实的基因型,我们在各种遗传结构和表型遗传力下生成了模拟数据,并评估了在基因型中排除或包含因果标记的影响。我们定义了几种用于 QTL 映射的统计标准,包括基于滑动窗口的几种方法,以考虑连锁不平衡 (LD)。我们比较和对比了这些统计数据及其准确优先排序已知因果标记的能力。总的来说,我们在中等至高度遗传力的性状中确认了 BayesR 的强大预测性能,特别是对于 50k 定制数据。在 50k 基因型中因果标记的遗传力低或 LD 弱的情况下,无论使用什么标准,QTL 映射都是一个挑战。BayesR 是一种同时获得准确预测和可解释的 SNP 效应大小分类的有前途的方法。我们在各种模拟场景中说明了 BayesR 的性能,并比较了每种方法的优缺点。