State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China.
National Research Facility for Phenotypic and Genotypic Analysis of Model Animals (Beijing), China Agricultural University, Beijing, China.
Genet Sel Evol. 2023 Oct 18;55(1):72. doi: 10.1186/s12711-023-00843-w.
Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve prediction accuracy. Previously, we reported a low-coverage sequencing genotyping method that yielded 11.3 million highly accurate single-nucleotide polymorphisms (SNPs) in pigs. Here, we introduce a method termed selective linkage disequilibrium pruning (SLDP), which refines the set of SNPs that show a large gain during prediction of complex traits using whole-genome SNP data.
We used the SLDP method to identify and select markers among millions of SNPs based on genome-wide association study (GWAS) prior information. We evaluated the performance of SLDP with respect to three real traits and six simulated traits with varying genetic architectures using two representative models (genomic best linear unbiased prediction and BayesR) on samples from 3579 Duroc boars. SLDP was determined by testing 180 combinations of two core parameters (GWAS P-value thresholds and linkage disequilibrium r). The parameters for each trait were optimized in the training population by five fold cross-validation and then tested in the validation population. Similar to previous GWAS prior-based methods, the performance of SLDP was mainly affected by the genetic architecture of the traits analyzed. Specifically, SLDP performed better for traits controlled by major quantitative trait loci (QTL) or a small number of quantitative trait nucleotides (QTN). Compared with two commercial SNP chips, genotyping-by-sequencing data, and an unselected whole-genome SNP panel, the SLDP strategy led to significant improvements in prediction accuracy, which ranged from 0.84 to 3.22% for real traits controlled by major or moderate QTL and from 1.23 to 11.47% for simulated traits controlled by a small number of QTN.
The SLDP marker selection method can be incorporated into mainstream prediction models to yield accuracy improvements for traits with a relatively simple genetic architecture, however, it has no significant advantage for traits not controlled by major QTL. The main factors that affect its performance are the genetic architecture of traits and the reliability of GWAS prior information. Our findings can facilitate the application of WGS-based genomic selection.
尽管全基因组测序(WGS)数据的积累加速了对复杂性状相关突变的鉴定,但它对基因组预测准确性的影响有限。可靠的基因型数据和预先选择的有益位点可用于提高预测准确性。此前,我们报道了一种低覆盖测序基因分型方法,该方法在猪中产生了 1130 万个高度准确的单核苷酸多态性(SNP)。在这里,我们引入了一种称为选择性连锁不平衡修剪(SLDP)的方法,该方法利用全基因组 SNP 数据,在预测复杂性状时,对表现出较大增益的 SNP 集进行了精细筛选。
我们使用 SLDP 方法根据全基因组关联研究(GWAS)先验信息,从数百万个 SNP 中识别和选择标记。我们使用两个代表性模型(基因组最佳线性无偏预测和 BayesR),在来自 3579 头杜洛克猪的样本中,针对三个真实性状和六个具有不同遗传结构的模拟性状,评估了 SLDP 的性能。通过测试 180 种两种核心参数(GWAS P 值阈值和连锁不平衡 r)的组合,确定了 SLDP。通过在训练群体中进行五次交叉验证来优化每个性状的参数,然后在验证群体中进行测试。与之前基于 GWAS 先验的方法类似,SLDP 的性能主要受分析性状的遗传结构的影响。具体而言,SLDP 对由主要数量性状位点(QTL)或少数数量性状核苷酸(QTN)控制的性状表现更好。与两种商业 SNP 芯片、测序基因分型数据和未选择的全基因组 SNP 面板相比,SLDP 策略显著提高了预测准确性,对于由主要或中度 QTL 控制的真实性状,预测准确性提高了 0.84%至 3.22%,对于由少数 QTN 控制的模拟性状,预测准确性提高了 1.23%至 11.47%。
SLDP 标记选择方法可被纳入主流预测模型,从而提高遗传结构相对简单的性状的准确性,但是,对于不受主要 QTL 控制的性状,它没有明显的优势。影响其性能的主要因素是性状的遗传结构和 GWAS 先验信息的可靠性。我们的研究结果可以促进基于 WGS 的基因组选择的应用。