Institute of Animal Breeding and Husbandry, Kiel University, Kiel, Germany.
Department of Animal Sciences, Georg-August-University, Göttingen, Germany.
BMC Genomics. 2022 Sep 3;23(1):631. doi: 10.1186/s12864-022-08716-0.
Structural variants and tandem repeats are relevant sources of genomic variation that are not routinely analyzed in genome wide association studies mainly due to challenging identification and genotyping. Here, we profiled these variants via state-of-the-art strategies in the founder animals of four F pig crosses using whole-genome sequence data (20x coverage). The variants were compared at a founder level with the commonly screened SNPs and small indels. At the F level, we carried out an association study using imputed structural variants and tandem repeats with four growth and carcass traits followed by a comparison with a previously conducted SNPs and small indels based association study.
A total of 13,201 high confidence structural variants and 103,730 polymorphic tandem repeats (with a repeat length of 2-20 bp) were profiled in the founders. We observed a moderate to high (r from 0.48 to 0.57) level of co-localization between SNPs or small indels and structural variants or tandem repeats. In the association step 56.56% of the significant variants were not in high LD with significantly associated SNPs and small indels identified for the same traits in the earlier study and thus presumably not tagged in case of a standard association study. For the four growth and carcass traits investigated, many of the already proposed candidate genes in our previous studies were confirmed and additional ones were identified. Interestingly, a common pattern on how structural variants or tandem repeats regulate the phenotypic traits emerged. Many of the significant variants were embedded or nearby long non-coding RNAs drawing attention to their functional importance. Through which specific mechanisms the identified long non-coding RNAs and their associated structural variants or tandem repeats contribute to quantitative trait variation will need further investigation.
The current study provides insights into the characteristics of structural variants and tandem repeats and their role in association studies. A systematic incorporation of these variants into genome wide association studies is advised. While not of immediate interest for genomic prediction purposes, this will be particularly beneficial for elucidating biological mechanisms driving the complex trait variation.
结构变异和串联重复是基因组变异的重要来源,但由于鉴定和基因分型具有挑战性,它们通常不在全基因组关联研究中进行分析。在这里,我们使用全基因组序列数据(20 倍覆盖)在四个 F 猪杂交的奠基动物中对这些变体进行了最先进的策略分析。在奠基者水平上,我们将这些变体与通常筛选的 SNP 和小插入缺失进行了比较。在 F 水平上,我们使用结构变体和串联重复的推断进行了与四个生长和胴体性状相关的关联研究,然后与之前进行的基于 SNP 和小插入缺失的关联研究进行了比较。
在奠基者中总共分析了 13201 个高可信度的结构变体和 103730 个多态性串联重复(重复长度为 2-20 bp)。我们观察到 SNP 或小插入缺失与结构变体或串联重复之间存在中度至高度(r 值从 0.48 到 0.57)的共定位水平。在关联步骤中,56.56%的显著变体与之前研究中同一性状显著关联的 SNP 和小插入缺失没有高度 LD,因此在标准关联研究中可能没有标记。对于研究的四个生长和胴体性状,我们之前研究中已经提出的许多候选基因得到了确认,并确定了其他候选基因。有趣的是,出现了一种关于结构变体或串联重复如何调节表型性状的共同模式。许多显著变体嵌入或附近有长非编码 RNA,引起了人们对其功能重要性的关注。鉴定的长非编码 RNA 及其相关的结构变体或串联重复如何通过特定机制对数量性状变异做出贡献,需要进一步研究。
本研究提供了结构变异和串联重复的特征及其在关联研究中的作用的见解。建议将这些变体系统地纳入全基因组关联研究中。虽然对基因组预测目的没有直接的兴趣,但这将特别有助于阐明驱动复杂性状变异的生物学机制。