Liu Jiaxin, Mo Dongxin, Luo Lingyun, Shi Yilong, Xu Songsong
Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
Genomics. 2025 May;117(3):111047. doi: 10.1016/j.ygeno.2025.111047. Epub 2025 Apr 19.
The reference genome plays a crucial role in uncovering genomic variations, which increase our understanding of the molecular mechanisms influencing biological traits. However, most of the sheep reference genomes derive from a single individual, which couldn't adequately represent the genetic diversity of sheep. The map-to-pan strategy was used to construct the sheep pan-genome based on 801 samples with short read whole genome sequencing data including 724 domestic individuals from 151 sheep populations/breeds and 77 wild individuals from seven genus Ovis species, and a total of 195 Mb of nonreference sequences were assembled that absent from the ARS-UI_Ramb_v2.0 reference. MAKER2 pipeline, integrating ab initio gene prediction, RNA-Seq, and protein homology was used to annotate the nonreference sequences. As a result, a total of additional 2678 genes were predicted in the nonreference sequences. We also identified 13,317 novel single nucleotide polymorphisms (SNPs) by mapping the sequences that could not be aligned to ARS1-UI_Ramb_v2.0 to the nonreference sequences. Population genetic analysis, including principal component analysis (PCA), phylogenetic tree, and ADMIXTURE based on the novel SNPs revealed a clear phylogenetic relationship of the world's domestic sheep, as well as their close wild relatives. Additionally, pangenome-wide presence and absence variations (PAVs) analysis exhibited a decreasing trend in gene number from wildto domestic populations. Several genes, including GZMH, NFE2L3, GPR146 and CALHM6 with significant changes of presence frequencies during the evolutionary history of sheep were identified by PAV selection analysis. Functional annotation revealed that these genes were primarily associated with immune responses. Our results highlight the implications of the sheep pan-genome in identifying previously unknown genetic variations.These findings broaden our knowledge about the genetic diversity in sheep genomes, and provide insight into the domestication and breeding history of sheep.
参考基因组在揭示基因组变异方面起着至关重要的作用,这增强了我们对影响生物学性状的分子机制的理解。然而,大多数绵羊参考基因组来自单个个体,无法充分代表绵羊的遗传多样性。基于801个样本的短读长全基因组测序数据构建绵羊泛基因组,其中包括来自151个绵羊群体/品种的724个家养个体和来自7个盘羊属物种的77个野生个体,共组装了195 Mb在ARS-UI_Ramb_v2.0参考基因组中不存在的非参考序列。使用整合从头基因预测、RNA测序和蛋白质同源性的MAKER2管道对非参考序列进行注释。结果,在非参考序列中总共预测到另外2678个基因。我们还通过将无法与ARS1-UI_Ramb_v2.0比对的序列映射到非参考序列中,鉴定出13317个新的单核苷酸多态性(SNP)。基于这些新SNP的群体遗传分析,包括主成分分析(PCA)、系统发育树和混合模型分析,揭示了世界家养绵羊及其近缘野生亲属之间清晰的系统发育关系。此外,泛基因组范围内的存在和缺失变异(PAV)分析显示,从野生群体到家养群体,基因数量呈下降趋势。通过PAV选择分析,鉴定出几个在绵羊进化历史中存在频率有显著变化的基因,包括GZMH、NFE2L3、GPR146和CALHM6。功能注释表明这些基因主要与免疫反应相关。我们的结果突出了绵羊泛基因组在识别以前未知的遗传变异方面的意义。这些发现拓宽了我们对绵羊基因组遗传多样性的认识,并为绵羊的驯化和育种历史提供了见解。