Division of Cardiovascular Medicine, Stanford University, Stanford, CA, USA.
Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA.
Sci Rep. 2021 Sep 20;11(1):18625. doi: 10.1038/s41598-021-97896-y.
With the establishment of large biobanks, discovery of single nucleotide variants (SNVs, also known as single nucleotide polymorphisms (SNVs)) associated with various phenotypes has accelerated. An open question is whether genome-wide significant SNVs identified in earlier genome-wide association studies (GWAS) are replicated in later GWAS conducted in biobanks. To address this, we examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, "discovery" GWAS and a later, "replication" GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNVs (of which 6289 reached P < 5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0%; although lower for binary than quantitative phenotypes (58.1% versus 94.8% respectively). There was a 18.0% decrease in SNV effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNV effect size, phenotype trait (binary or quantitative), and discovery P value, we built and validated a model that predicted SNV replication with area under the Receiver Operator Curve = 0.90. While non-replication may reflect lack of power rather than genuine false-positives, these results provide insights about which discovered associations are likely to be replicated across subsequent GWAS.
随着大型生物库的建立,与各种表型相关的单核苷酸变异(SNV,也称为单核苷酸多态性(SNVs))的发现速度加快。一个悬而未决的问题是,在早期全基因组关联研究(GWAS)中确定的与全基因组显著相关的 SNV 是否在生物库中进行的后续 GWAS 中得到复制。为了解决这个问题,我们检查了一个公开的 GWAS 数据库,并确定了针对同一表型的两个独立的 GWAS(较早的“发现”GWAS 和后来在英国生物库中进行的“复制”GWAS)。该分析评估了来自 4397962 名参与者的 9 种表型的 136318924 个 SNV(其中 6289 个在发现 GWAS 中达到 P < 5e-8)。总体复制率为 85.0%;尽管二进制比定量表型的复制率要低(分别为 58.1%和 94.8%)。对于二进制表型,SNV 效应大小降低了 18.0%,而对于定量表型,则增加了 12.0%。使用发现的 SNV 效应大小、表型特征(二进制或定量)和发现的 P 值,我们构建并验证了一个预测 SNV 复制的模型,其接收者操作曲线下面积为 0.90。虽然非复制可能反映缺乏能力而不是真正的假阳性,但这些结果提供了有关哪些发现的关联可能在随后的 GWAS 中得到复制的见解。