Department of Plant Sciences, University of California, Davis, California, United States of America.
Biostatistics Unit, Institute of Crop Science, University of Hohenheim, Stuttgart, Germany.
PLoS Genet. 2021 Aug 26;17(8):e1009762. doi: 10.1371/journal.pgen.1009762. eCollection 2021 Aug.
The development of genome-informed methods for identifying quantitative trait loci (QTL) and studying the genetic basis of quantitative variation in natural and experimental populations has been driven by advances in high-throughput genotyping. For many complex traits, the underlying genetic variation is caused by the segregation of one or more 'large-effect' loci, in addition to an unknown number of loci with effects below the threshold of statistical detection. The large-effect loci segregating in populations are often necessary but not sufficient for predicting quantitative phenotypes. They are, nevertheless, important enough to warrant deeper study and direct modelling in genomic prediction problems. We explored the accuracy of statistical methods for estimating the fraction of marker-associated genetic variance (p) and heritability ([Formula: see text]) for large-effect loci underlying complex phenotypes. We found that commonly used statistical methods overestimate p and [Formula: see text]. The source of the upward bias was traced to inequalities between the expected values of variance components in the numerators and denominators of these parameters. Algebraic solutions for bias-correcting estimates of p and [Formula: see text] were found that only depend on the degrees of freedom and are constant for a given study design. We discovered that average semivariance methods, which have heretofore not been used in complex trait analyses, yielded unbiased estimates of p and [Formula: see text], in addition to best linear unbiased predictors of the additive and dominance effects of the underlying loci. The cryptic bias problem described here is unrelated to selection bias, although both cause the overestimation of p and [Formula: see text]. The solutions we described are predicted to more accurately describe the contributions of large-effect loci to the genetic variation underlying complex traits of medical, biological, and agricultural importance.
基因组学方法的发展用于鉴定数量性状基因座 (QTL) 和研究自然和实验群体中数量变异的遗传基础,这得益于高通量基因分型技术的进步。对于许多复杂性状,潜在的遗传变异是由一个或多个“大效应”基因座的分离引起的,此外还有数量未知的具有低于统计检测阈值效应的基因座。在群体中分离的大效应基因座通常是预测数量表型所必需的,但不足以充分预测。然而,它们非常重要,值得在基因组预测问题中进行更深入的研究和直接建模。我们探讨了用于估计复杂表型下大效应基因座相关的标记关联遗传方差 (p) 和遗传力 ([Formula: see text]) 的统计方法的准确性。我们发现,常用的统计方法高估了 p 和 [Formula: see text]。这种向上偏差的来源可以追溯到这些参数的分子和分母中方差分量的预期值之间的不等式。我们找到了偏倚校正估计 p 和 [Formula: see text]的代数解,这些解仅取决于自由度,并且对于给定的研究设计是常数。我们发现,平均半方差方法,迄今为止尚未用于复杂性状分析,除了可以对潜在基因座的加性和显性效应进行最佳线性无偏预测外,还可以产生 p 和 [Formula: see text] 的无偏估计。此处描述的隐藏偏差问题与选择偏差无关,尽管两者都会导致 p 和 [Formula: see text] 的高估。我们描述的解决方案预计将更准确地描述大效应基因座对具有医学、生物学和农业重要性的复杂性状遗传变异的贡献。