Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, 72704, USA.
BMC Genomics. 2019 Jul 29;20(1):618. doi: 10.1186/s12864-019-5992-7.
Selection of an appropriate statistical significance threshold in genome-wide association studies is critical to differentiate true positives from false positives and false negatives. Different multiple testing comparison methods have been developed to determine the significance threshold; however, these methods may be overly conservative and may lead to an increase in false negatives. Here, we developed an empirical formula to determine the statistical significance threshold that is based on the marker-based heritability of the trait. To develop a formula for a significance threshold, we used 45 simulated traits in soybean, maize, and rice that varied in both broad sense heritability and the number of QTLs.
A formula to determine a significance threshold was developed based on a regression equation that used one independent variable, marker-based heritability, and one response variable, - log (P)-values. For all species, the threshold -log (P)-values increased as both marker-based and broad-sense heritability increased. Higher broad sense heritability in these crops resulted in higher significant threshold values. Among crop species, maize, with a lower linkage disequilibrium pattern, had higher significant threshold values as compared to soybean and rice.
Our formula was less conservative and identified more true positive associations than the false discovery rate and Bonferroni correction methods.
在全基因组关联研究中,选择适当的统计学显著性阈值对于区分真正的阳性和假阳性和假阴性至关重要。已经开发出不同的多重检验比较方法来确定显著性阈值;然而,这些方法可能过于保守,可能导致假阴性增加。在这里,我们开发了一种基于性状基于标记的遗传力的确定统计显著性阈值的经验公式。为了制定一个显著性阈值的公式,我们使用了大豆、玉米和水稻中 45 个具有不同广义遗传力和 QTL 数量的模拟性状。
根据使用一个独立变量(基于标记的遗传力)和一个响应变量(-log(P)-值)的回归方程,开发了一个确定显著性阈值的公式。对于所有物种,随着基于标记和广义遗传力的增加,阈值-log(P)-值增加。这些作物中较高的广义遗传力导致较高的显著阈值。与大豆和水稻相比,玉米具有较低的连锁不平衡模式,其显著阈值较高。
与错误发现率和 Bonferroni 校正方法相比,我们的公式不那么保守,并且鉴定出更多的真正阳性关联。