Bian Peipei, Li Jiaxin, Zhou Shishuo, Wang Xingquan, Gong Mian, Guo Xi, Cai Yudong, Yang Qimeng, Fu Jiaqi, Li Rongrong, Huang Shuhong, Luo Funong, Shah Ali Mujtaba, Lenstra Johannes A, Mwacharo Joram M, Li Ran, Ren Gang, Wang Xiaolong, Li Cong, Zheng Wenxin, Jiang Yu, Wang Xihong
Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China.
State Key Laboratory of Animal Biotech Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing 100193, China.
Mol Biol Evol. 2024 Dec 6;41(12). doi: 10.1093/molbev/msae251.
Pangenomes can facilitate a deeper understanding of genome complexity. Using de novo phased long-read assemblies of eight representative goat breeds, we constructed a graph-based pangenome of goats (Capra hircus) and discovered 113-Mb autosomal novel sequences. Combining this multi-assembly pangenome with low-coverage PacBio HiFi sequences, we constructed a long-read structural variations (SVs) database containing 59,325 SV deletions, 84,910 SV insertions, and 24,954 other complex SV alleles. This resource allowed reliable graph-based genotyping from short reads of 79 wild and 1,148 worldwide domestic goats. Selection signal analysis of SV captured a novel immune-related domestication locus containing the galectin-9 gene and extra copies of the ruminant-specific galectin-9-like genes (LGALS9L), which have high tissue specificity. A segmental duplication in domestic goats generates three additional LGALS9L copies. Ancient goat genome sequences show a gradual increase in frequency of this duplication from the Neolithic to the present. Two other newly detected SVs also have higher selection signals than adjacent SNPs, a truncated-LINE1 deletion in EDAR2 associated with cashmere production and a VNTR-related insertion in PAPSS2 linked to high-altitude adaptation. In summary, the multi-assembly goat pangenome and long-read SV database facilitates detecting complex variations that are important in evolution and selection.
泛基因组有助于更深入地理解基因组复杂性。利用八个代表性山羊品种的从头分阶段长读长组装,我们构建了基于图的山羊泛基因组(Capra hircus),并发现了113兆碱基的常染色体新序列。将这个多组装泛基因组与低覆盖度的PacBio HiFi序列相结合,我们构建了一个长读长结构变异(SVs)数据库,其中包含59325个SV缺失、84910个SV插入和24954个其他复杂SV等位基因。该资源使得能够基于图从79只野生山羊和1148只全球家养山羊的短读长中进行可靠的基因分型。对SV的选择信号分析捕获了一个新的免疫相关驯化位点,该位点包含半乳糖凝集素-9基因和反刍动物特异性半乳糖凝集素-9样基因(LGALS9L)的额外拷贝,这些基因具有高组织特异性。家养山羊中的一个片段重复产生了另外三个LGALS9L拷贝。古代山羊基因组序列显示,从新石器时代到现在,这种重复的频率逐渐增加。另外两个新检测到的SVs也比相邻的单核苷酸多态性具有更高的选择信号,一个与羊绒生产相关的EDAR2中的截短LINE1缺失,以及一个与高海拔适应相关的PAPSS2中的VNTR相关插入。总之,多组装山羊泛基因组和长读长SV数据库有助于检测在进化和选择中重要的复杂变异。