Unit of Animal Genomics, GIGA-R & Faculty of Veterinary Medicine, University of Liège, Avenue de L'Hôpital, 1, 4000, Liège, Belgium.
Walloon Breeders Association, Rue Des Champs Elysées, 4, Ciney, 5590, Belgium.
BMC Genomics. 2024 Jul 13;25(1):690. doi: 10.1186/s12864-024-10600-y.
Heritability partitioning approaches estimate the contribution of different functional classes, such as coding or regulatory variants, to the genetic variance. This information allows a better understanding of the genetic architecture of complex traits, including complex diseases, but can also help improve the accuracy of genomic selection in livestock species. However, methods have mainly been tested on human genomic data, whereas livestock populations have specific characteristics, such as high levels of relatedness, small effective population size or long-range levels of linkage disequilibrium.
Here, we used data from 14,762 cows, imputed at the whole-genome sequence level for 11,537,240 variants, to simulate traits in a typical livestock population and evaluate the accuracy of two state-of-the-art heritability partitioning methods, GREML and a Bayesian mixture model. In simulations where a single functional class had increased contribution to heritability, we observed that the estimators were unbiased but had low precision. When causal variants were enriched in variants with low (< 0.05) or high (> 0.20) minor allele frequency or low (below 1st quartile) or high (above 3rd quartile) linkage disequilibrium scores, it was necessary to partition the genetic variance into multiple classes defined on the basis of allele frequencies or LD scores to obtain unbiased results. When multiple functional classes had variable contributions to heritability, estimators showed higher levels of variation and confounding between certain categories was observed. In addition, estimators from small categories were particularly imprecise. However, the estimates and their ranking were still informative about the contribution of the classes. We also demonstrated that using methods that estimate the contribution of a single category at a time, a commonly used approach, results in an overestimation. Finally, we applied the methods to phenotypes for muscular development and height and estimated that, on average, variants in open chromatin regions had a higher contribution to the genetic variance (> 45%), while variants in coding regions had the strongest individual effects (> 25-fold enrichment on average). Conversely, variants in intergenic or intronic regions showed lower levels of enrichment (0.2 and 0.6-fold on average, respectively).
Heritability partitioning approaches should be used cautiously in livestock populations, in particular for small categories. Two-component approaches that fit only one functional category at a time lead to biased estimators and should not be used.
遗传力分割方法估计不同功能类别(如编码或调节变异)对遗传方差的贡献。这种信息可以帮助更好地理解复杂性状的遗传结构,包括复杂疾病,但也可以帮助提高家畜物种基因组选择的准确性。然而,这些方法主要在人类基因组数据上进行了测试,而家畜群体具有特定的特征,例如高度的亲缘关系、小的有效群体大小或长程水平的连锁不平衡。
在这里,我们使用了来自 14762 头奶牛的数据,这些数据在全基因组序列水平上进行了 11537240 个变体的推断,以模拟典型家畜群体中的性状,并评估了两种最先进的遗传力分割方法 GREML 和贝叶斯混合模型的准确性。在模拟中,当单一功能类别对遗传力的贡献增加时,我们观察到估计值是无偏的,但精度较低。当因果变异富集在等位基因频率较低(<0.05)或较高(>0.20)、最小等位基因频率较低(低于第 1 四分位数)或较高(高于第 3 四分位数)的变体或低(低于第 1 四分位数)或高(高于第 3 四分位数)连锁不平衡分数的变体中时,有必要根据等位基因频率或 LD 分数将遗传方差分割成多个类别,以获得无偏的结果。当多个功能类别对遗传力的贡献不同时,估计值表现出更高的变化水平,并且观察到某些类别之间存在混淆。此外,来自小类别的估计值特别不准确。然而,这些估计值及其排名仍然可以提供有关类别的贡献的信息。我们还证明,使用一次估计一个类别的方法(一种常用方法)会导致高估。最后,我们将这些方法应用于肌肉发育和身高的表型,并估计开放染色质区域的变体对遗传方差的贡献平均更高(>45%),而编码区域的变体具有最强的个体效应(平均富集超过 25 倍)。相反,基因间或内含子区域的变体显示出较低的富集水平(平均分别为 0.2 和 0.6 倍)。
遗传力分割方法在家畜群体中应谨慎使用,特别是对于小类别。一次仅拟合一个功能类别的两成分方法会导致有偏估计值,不应使用。